Yi Xing, PhD, leads the Center for Computational and Genomic Medicine at Children's Hospital of Philadelphia
Yi Xing, PhD, leads the Center for Computational and Genomic Medicine at Children's Hospital of Philadelphia

CHOP researcher Dr. Xing develops more accurate computational tool for long-read RNA sequencing

The tool, called ESPRESSO, will allow for better diagnosis of rare genetic diseases caused by disrupted RNA and for the discovery of potential therapeutic targets in diseases like cancer

On the journey from gene to protein, a nascent RNA molecule can be cut and joined, or spliced, in different ways before being translated into a protein. This process, known as alternative splicing, allows a single gene to encode several different proteins. Alternative splicing occurs in many biological processes, like when stem cells mature into tissue-specific cells. In the context of disease, however, alternative splicing can be dysregulated. Therefore, it is important to examine the transcriptome – that is, all the RNA molecules that might stem from genes – to understand the root cause of a condition.

However, historically it has been difficult to "read" RNA molecules in their entirety because they are usually thousands of bases long. Instead, researchers have relied on so-called short-read RNA sequencing, which breaks RNA molecules and sequences them in much shorter pieces – somewhere between 200 to 600 bases, depending on the platform and protocol. Supercomputer programs are then used to reconstruct the full sequences of RNA molecules. Short-read RNA sequencing can give highly accurate sequencing data, with a low per-base error rate of approximately 0.1% (meaning one base is incorrectly determined for every 1,000 bases sequenced). Nevertheless, it is limited in the information that it can provide due to the short length of the sequencing reads. In many ways, short-read RNA sequencing is like breaking a large picture into jigsaw pieces that are all the same shape and size and then trying to piece the picture back together.

Recently, "long-read" platforms that can sequence RNA molecules over 10,000 bases in length end-to-end have become available. These platforms do not require RNA molecules to be broken up before being sequenced, but they have a much higher per-base error rate, typically between 5% and 20%. This well-known limitation has severely hampered the widespread adoption of long-read RNA sequencing. In particular, the high error rate has made it difficult to determine the validity of novel, previously unknown RNA molecules discovered in a particular condition or disease.

To circumvent this problem, researchers at the Children's Hospital of Philadelphia (CHOP) have developed a new computational tool that can more accurately discover and quantify RNA molecules from these error-prone long-read RNA sequencing data. The tool, called ESPRESSO (Error Statistics PRomoted Evaluator of Splice Site Options), was reported today in Science Advances.

"Long-read RNA sequencing is a powerful technology that will allow us to uncover RNA variation in rare genetic diseases and other conditions, like cancer," said Yi Xing, Ph.D., director of the Center for Computational and Genomic Medicine at CHOP and senior author of the study. "We are probably at an inflection point in how we discover and analyze RNA molecules. The transition from short-read to long-read RNA sequencing represents an exciting technological transformation and computational tools that reliably interpret long-read RNA sequencing data are urgently needed."

ESPRESSO can accurately discover and quantify different RNA molecules from the same gene – known as RNA isoforms – using error-prone long-read RNA sequencing data alone. To do so, the computational tool compares all long RNA sequencing reads of a given gene to its corresponding genomic DNA and then uses the error patterns of individual long reads to confidently identify splice junctions – places where the nascent RNA molecule has been cut and joined – as well as their corresponding full-length RNA isoforms. By finding areas of perfect matches between long RNA sequencing reads and genomic DNA, as well as borrowing information across all long RNA sequencing reads of a gene, the tool can identify highly reliable splice junctions and RNA isoforms, including those that have not been previously documented in existing databases.

The researchers evaluated the performance of ESPRESSO using simulated data and data on real biological samples. They found that ESPRESSO performs better than multiple currently available tools, both in terms of discovering RNA isoforms and quantifying them. The researchers also generated and analyzed over 1 billion long RNA sequencing reads covering 30 human tissue types and three human cell lines, providing a useful resource for studying human transcriptome variation at the resolution of full-length RNA isoforms.

"ESPRESSO addresses a long-standing problem of long-read RNA sequencing and could usher in discovery opportunities," Dr. Xing said. "We envision that ESPRESSO will be a useful tool for researchers to explore the RNA repertoire of cells in various biomedical and clinical settings."

This work was supported in part by the Immuno-Oncology Translational Network (IOTN) of the National Cancer Institute's Cancer Moonshot Initiative (U01CA233074), other National Institutes of Health funding (R01GM088342, R01GM121827, and R56HG012310), along with a National Institutes of Health T32 Training Grant in Computational Genomics (T32HG000046).

Gao et al. "ESPRESSO: Robust discovery and quantification of transcript isoforms from error-prone long-read RNA-seq data," Science Advances, January 20, 2023, DOI: 10.1126/sciadv.abq5072

l-r: Filippo Martinelli and Bram Nap, both PhD students at the University of Galway Molecular Systems Physiology group; Professor Ines Thiele Professor of Systems Biomedicine and Dr Ronan Fleming, Associate Professor in Medicine at University of Galway.
l-r: Filippo Martinelli and Bram Nap, both PhD students at the University of Galway Molecular Systems Physiology group; Professor Ines Thiele Professor of Systems Biomedicine and Dr Ronan Fleming, Associate Professor in Medicine at University of Galway.

Irish scientists build simulation based on digital microbes called AGORA2 database

Digital microbe database unlocks patient response to treatment for diseases such as Parkinson’s and colorectal cancer

Researchers at the University of Galway associated with APC Microbiome Ireland, a world-leading SFI Research Centre, have created a resource of over 7,000 digital microbes – enabling computer simulations of how drug treatments work and how patients may respond. The help is a milestone in the scientific understanding of human response to medical treatment as it offers the opportunity for computer simulations and predictions of differences in metabolism between individuals, including for diseases such as inflammatory bowel, Parkinson’s, and colorectal cancer.

The database  - called AGORA2 - builds on the expertise developed in creating the first resource of digital microbes known as AGORA1. AGORA2 encompasses 7,203 digital microbes, created based on experimental knowledge from scientific publications, with a particular focus on drug metabolism. 

The resource has been built by a team of scientists at the University of Galway’s Molecular Systems Physiology group, led by APC Microbiome Ireland principal investigator Professor Ines Thiele.

The team’s research aims to advance precision medicine by using computational modeling. 

Professor Thiele explained: “AGORA2 is a milestone towards personalized, predictive computer simulations enabling the analysis of person-microbiome-drug interactions for precision medicine applications.

“Humans are hosting a myriad of microbes. Just like us, these microbes eat and interact with their environment. Considering that we are all unique, each of us hosting an individual microbiome our metabolism is also expected to vary between individuals. 

“The insight provided by the database of digital microbes presents a healthcare opportunity to harness individual differences in metabolism to provide personalized, improved treatments in ‘precision medicine’, compared to a currently more general ‘one-size-fits-all’ approach.

“Besides our food, our microbiomes also metabolize the medicines we take. The same drug may therefore manifest diverse effects in disparate people because of the differences in metabolism performed by the different microbiomes.”

Using the digital microbe resource AGORA2, computer simulations have shown that drug metabolism varies significantly between individuals, as driven by their microbiomes. 

Uniquely, the AGORA2-based supercomputer simulations enabled the identification of microbes and metabolic processes for individual drugs correlated with observations in a clinical setting. 

The research was published today in Nature Biotechnology. 

The team at the University of Galway demonstrated that AGORA2 enables personalized, strain-resolved modeling by predicting the drug conversion potential of the gut microbiomes from 616 colorectal cancer patients and controls, which greatly varied between individuals and correlated with age, sex, body mass index, and disease stages. This means that the team can create digital representations and predictions specific to the divergent microbes.

Professor Thiele added: “Knowledge of our individual microbiomes and their drug-metabolizing capabilities represents a precision medicine opportunity to tailor drug treatments to an individual to maximize health benefits while minimizing side effects.

“By using AGORA2 in computer simulations our team has shown that the resulting metabolic predictions enabled superior performance compared to what was possible to date.”

Professor Paul Ross, Director of APC Microbiome Ireland, said“This research is a perfect illustration of the power of computational approaches to enhance our understanding of the role of microbes in health and disease – significantly this digital platform will be a fantastic resource that could lead to the development of novel personalized therapeutic approaches which take the microbiome into account.” 

This work was led by the University of Galway and completed as part of a collaboration between many international institutions, including the University of Lorraine, and the University of Medicine Greifswald.

Architecture diagram of deep-learning. (Image by IOCAS)
Architecture diagram of deep-learning. (Image by IOCAS)

IOCAS prof HU Shijian uses deep learning to predict ITF; provides an important tool for researching climate change

Scientists from the Institute of Oceanology of the Chinese Academy of Sciences (IOCAS) and Nanjing University of Information Science and Technology have successfully constructed an inference and prediction system of the Indonesian Throughflow (ITF) using the deep-learning method and realized the valid prediction of the ITF transport. The study was published in Frontiers in Marine Science on Jan. 16. 

The Indonesian seas are the only ocean channel connecting the tropical ocean basins. The ITF is a crucial ocean dynamic factor for the inter-basin exchange between the Indian Ocean basin and the Pacific Ocean basin. The ITF has a strong material and energy transport and hence plays a significant role in the material and energy balance of the Indo-Pacific Ocean and regional and global climate change. However, the prediction of ITF mainly relies on numerical simulation systems, which often have significant model biases and great uncertainty.

Given this, the researchers led by Prof. HU Shijian put forward the idea of combining satellite observations with artificial intelligence methods to construct the inference and prediction system of ITF and conducted experiments with various deep-learning models. 

The Indo-Pacific pressure gradient is the main driving factor of ITF, so researchers used sea surface heights between the Indian and Pacific Ocean basins to infer and predict the transport of ITF. They trained the convolutional neural network (CNN) using the massive data provided by the Coupled Model Intercomparison Project Phase 6 model and Simple Ocean Data Assimilation data sets and reconstructed a time series of ITF transport.

The training results showed that the system based on the CNN model reproduces about 90% of the total variance of ITF transport, indicating that the system can achieve valid inference of ITF transport. 

The researchers further combined the system with the satellite data from 1993 to 2021 to infer and construct the time series of ITF and found that the time series was in good agreement with the internationally renowned field observation data of ITF. They explored the possibility of predicting ITF with this AI system, and the results show that the system can make a valid prediction with a leading time of seven months.  

"The ITF AI inference and prediction system provides an important tool for researching ocean circulation and climate change in the Indo-Pacific Ocean, which may alleviate the pressure of field ocean observation to some extent," said Prof. HU.  

This work was supported by the Natural Science Foundation of Shandong Province and the Strategic Pilot Science and Technology Special Project of CAS. 

Artistic depiction of a pair of holes caused by the magnetic background of the system.  C.Hohmann/MCQST
Artistic depiction of a pair of holes caused by the magnetic background of the system. C.Hohmann/MCQST

MPQ-built quantum simulator allows the first microscopic observation of charge carriers pairing

A team of German researchers at the MPQ has for the first time monitored in an experiment how holes (positive charge carriers) in a solid-state model combined to form pairs. This process could play an important role in understanding high-temperature superconductivity. 

Using a quantum simulator, researchers at the Max Planck Institute of Quantum Optics (MPQ) have observed pairs of charge carriers that may be responsible for the resistance-free transport of electric current in high-temperature superconductors. So far, the exact physical mechanisms in these complex materials are still largely unknown. Theories assume that the cause for the pair formation and thus for the phenomenon of superconductivity lies in magnetic forces. The team in Garching has now for the first time been able to demonstrate pairs that are formed this way. Their experiment was based on a lattice-like arrangement of cold atoms, as well as on a tricky suppression of the movement of free charge carriers. 

Since the discovery of high-temperature superconductors almost 40 years ago, scientists have been trying to track down their fundamental quantum-physical mechanisms. But the complex materials still pose mysteries. The new findings of a team in the Quantum Many-Body Systems Department at MPQ in Garching now provide new microscopic insight into processes that may underlie these so-called unconventional superconductors.

Crucial to any kind of superconductivity is the formation of tightly linked pairs of charge carriers - electrons or holes, as electrons vacancies are called. "The reason for this lies in quantum mechanics," explains MPQ physicist Sarah Hirthe: each electron or hole carries a half-integer spin – a quantum physical quantity that can be imagined as a measure of a particle's internal rotation. Atoms also have a spin. For quantum statistical reasons, however, only particles with an integer spin can move through a crystal lattice without resistance under certain conditions. "Therefore, electrons or holes have to pair up to do this," says Hirthe. In conventional superconductors, lattice vibrations called phonons help with pairing. In non-conventional superconductors, on the other hand, a different mechanism is at work – but the question of which one it is has remained unanswered until now. "In a widely accepted theory, indirect magnetic forces play a crucial role," Sarah Hirthe reports. "But this could not be confirmed in experiments so far."

Solid state model spiked with holes Binding mechanism in a magnetically ordered system. The red and blue spheres are spins of opposite orientations, the… [more]  © MPQ

To better understand the processes in such materials, the researchers used a quantum simulator: a kind of quantum computer that recreates physical systems. To do this, they arranged ultracold atoms in a vacuum with laser light in such a way that they simulate the electrons in a simplified solid-state model. In the process, the spins of the atoms arranged themselves with alternating directions: an antiferromagnetic structure was created, which is characteristic of many high-temperature superconductors – and stabilized by magnetic interactions. The team then "doped" this model by reducing the number of atoms in the system. In that way, holes emerged into the lattice-like structure.

The team at MPQ now could show that the magnetic forces indeed lead to pairs. To achieve this, they used an experimental trick. "Moving charge carriers in a material like high-temperature superconductors are subject to a competition of different forces," explains Hirthe. On the one hand, they have the urge to spread out, i.e. to be everywhere at the same time. This gives them an energetic advantage.

On the other hand, magnetic interactions ensure a regular arrangement of the spin states of atoms, electrons, and holes – and presumably the formation of charge carrier pairs. However: "The competition of forces has so far prevented us from observing such pairs microscopically," says Timon Hilker, leader of the research group. "That's why we had the idea of preventing the disruptive movement of the charge carriers in one spatial direction."

A close look through the quantum gas microscope

This way, the magnetic forces were, to a large extent, undisturbed. The result: holes that came close to each other formed the expected pairs. To observe such pairing, the team used a quantum gas microscope – a device with which quantum mechanical processes can be followed in detail. Not only were the hole pairs revealed, but the relative arrangement of the pairs was also observed, suggesting repulsive forces between them. The team reports on their work in the scientific journal "Nature". "The results underline the idea that the loss of electrical resistance in non-conventional superconductors is caused by magnetic forces," emphasizes Prof Immanuel Bloch, Director at MPQ and Head of the Quantum Many-Body Systems Division. "This leads to a better understanding of these extraordinary materials and shows a new way of how stable hole pairs can form even at very high temperatures, potentially significantly increasing the critical temperature of superconductors".

The researchers at the Max Planck Institute of Quantum Optics now plan new experiments on more complex models in which large two-dimensional arrays of atoms are connected. Such larger systems will hopefully create more hole pairs and allow for observing their motion through the lattice: the transport of electric current without resistance due to superconductivity.

This way, the magnetic forces were, to a large extent, undisturbed. The result: holes that came close to each other formed the expected pairs. To observe such pairing, the team used a quantum gas microscope – a device with which quantum mechanical processes can be followed in detail. Not only were the hole pairs revealed, but the relative arrangement of the pairs was also observed, suggesting repulsive forces between them. The team reports on their work in the scientific journal "Nature". "The results underline the idea that the loss of electrical resistance in non-conventional superconductors is caused by magnetic forces," emphasizes Prof Immanuel Bloch, Director at MPQ and Head of the Quantum Many-Body Systems Division. "This leads to a better understanding of these extraordinary materials and shows a new way of how stable hole pairs can form even at very high temperatures, potentially significantly increasing the critical temperature of superconductors".

The researchers at the Max Planck Institute of Quantum Optics now plan new experiments on more complex models in which large two-dimensional arrays of atoms are connected. Such larger systems will hopefully create more hole pairs and allow for observing their motion through the lattice: the transport of electric current without resistance due to superconductivity.

Credit: DECaPS2/DOE/FNAL/DECam/CTIO/NOIRLab/NSF/AURA Image processing: M. Zamani & D. de Martin (NSF’s NOIRLab)
Credit: DECaPS2/DOE/FNAL/DECam/CTIO/NOIRLab/NSF/AURA Image processing: M. Zamani & D. de Martin (NSF’s NOIRLab)

DECaPS2 survey produces more than 10 terabytes of data on the majesty of the Milky Way

A new astronomical survey is a portrait of gargantuan proportions. It shows the staggering number of stars bristling among the wispy bands of dust in our home galaxy, the Milky Way. The heart of our galaxy — the central bulge of bright blue stars that also contains the supermassive black hole Sagittarius A* — is at the left side of this panorama.

This galactic panorama was captured by the Dark Energy Camera (DECam) instrument on the Víctor M. Blanco 4-meter Telescope at Cerro Tololo Inter-American Observatory (CTIO), a Program of NSF's NOIRLab. CTIO is a constellation of international astronomical telescopes perched atop Cerro Tololo in Chile at an altitude of 2,200 meters (7,200 feet). CTIO's lofty vantage point gives astronomers an unrivaled view of the southern celestial hemisphere, which allowed DECam to capture the southern Galactic plane in such detail.

The data used to create this survey originate from the second release of the Dark Energy Camera Plane Survey (DECaPS2), a survey of the plane of the Milky Way as seen from the southern sky taken at optical and near-infrared wavelengths. The new data is described today in The Astrophysical Journal Supplement.

"One of the main reasons for the success of DECaPS2 is that we simply pointed at a region with an extraordinarily high density of stars and were careful about identifying sources that appear nearly on top of each other," says Andrew Saydjari, a graduate student at Harvard University, a researcher at the Center for Astrophysics | Harvard & Smithsonian, and lead author of the paper. "Doing so allowed us to produce the largest catalog ever from a single camera, in terms of the number of objects observed."

The first trove of data from DECaPS was released in 2017. With the addition of the new data, the survey now covers 6.5 percent of the night sky and spans a staggering 130 degrees in length. While it might sound modest, this equates to 13,000 times the angular area of the full Moon.

"When combined with images from Pan-STARRS 1, DECaPS2 completes a 360-degree panoramic view of the Milky Way's disk and additionally reaches much fainter stars," says Edward Schlafly, a researcher at the AURA-managed Space Telescope Science Institute and a co-author-of-the-paper-describing-DECaPS2-published in The Astrophysical Journal Supplement. "With this new survey, we can map the three-dimensional structure of the Milky Way's stars and dust in unprecedented detail."

Gathering the data required to cover this much of the night sky was a Herculean task; the DECaPS2 survey identified 3.32 billion objects from over 21,400 individual exposures. Its two-year run, which involved about 260 hours of observations, produced more than 10 terabytes of data.

Most of the stars and dust in the Milky Way are located in its spiral disk — the bright band stretching across this image. While this profusion of stars and dust makes for beautiful images, it also makes the galactic plane challenging to observe. The dark tendrils of dust seen threading through this image absorb starlight and blot out fainter stars entirely, and the light from diffuse nebulae interferes with any attempts to measure the brightness of individual objects. Another challenge arises from the sheer number of stars, which can overlap in the image and make it difficult to disentangle individual stars from their neighbors.

Despite the challenges, astronomers delved into the galactic plane to gain a better understanding of our Milky Way. By observing near-infrared wavelengths, they were able to peer past much of the light-absorbing dust. The researchers also used an innovative data-processing approach, which allowed them to better predict the background behind each star. This helped to mitigate the effects of nebulae and crowded star fields on such large astronomical images, ensuring that the final catalog of processed data is more accurate.

"Since my work on the Sloan Digital Sky Survey two decades ago, I have been looking for a way to make better measurements on top of complex backgrounds," said Douglas Finkbeiner, a professor at the Center for Astrophysics, co-author of the paper, and principal investigator behind the project. "This work has achieved that and more!"

"This is quite a technical feat. Imagine a group photo of over three billion people and every single individual is recognizable!" says Debra Fischer, division director of Astronomical Sciences at NSF. "Astronomers will be poring over this detailed portrait of more than three billion stars in the Milky Way for decades to come. This is a fantastic example of what partnerships across federal agencies can achieve."

Interactive access to the imaging with panning/zooming inside of a web browser is available from the LegacySurveyViewer, the World Wide Telescope, and Aladin.

The DECaPS2 dataset is available to the entire scientific community and is hosted by NOIRLab's Astro Data Lab, which is part of the Community Science and Data Center.

DECam was originally built to carry out the Dark Energy Survey, which was conducted by the Department of Energy and the U.S. National Science Foundation between 2013 and 2019.

The DECaPS2 team is composed of A. K. Saydjari (Harvard University and the Center for Astrophysics | Harvard & Smithsonian), E. F. Schlafly (Space Telescope Science Institute), D. Lang (Perimeter Institute for Theoretical Physics and the University of Waterloo), A. M. Meisner (NSF's NOIRLab), G. M. Green (Max Planck Institute for Astronomy), C. Zucker (Space Telescope Science Institute and the Center for Astrophysics | Harvard & Smithsonian), I. Zelko (Canadian Institute of Theoretical Astrophysics - University of Toronto), J. S. Speagle (University of Toronto), T. Daylan (Princeton University), A. Lee (Bill & Melinda Gates Foundation), F. Valdes (NSF’s NOIRLab), D. Schlegel (Lawrence Berkeley National Laboratory), and D. P. Finkbeiner (Harvard University and the Center for Astrophysics | Harvard & Smithsonian).