Christina Theodoris and her colleagues at Gladstone Institutes, the Broad Institute of MIT and Harvard, and Dana-Farber Cancer Institute trained a computer model to understand how genes interact. Photo: Michael Short/Gladstone Institutes
Christina Theodoris and her colleagues at Gladstone Institutes, the Broad Institute of MIT and Harvard, and Dana-Farber Cancer Institute trained a computer model to understand how genes interact. Photo: Michael Short/Gladstone Institutes

Gladstone Institutes' Theodoris builds ML models that predict the consequences of gene modifications

Researchers trained a computer model to understand the connections between thousands of genes and pinpoint how those connections go awry in human disease

Researchers at Gladstone Institutes in San Francisco, the Broad Institute of MIT and Harvard, and Dana-Farber Cancer Institute have turned to artificial intelligence (AI) to help them understand how large networks of interconnected human genes control the function of cells, and how disruptions in those networks cause disease. 

Large language models, also known as foundation models, are AI systems that learn fundamental knowledge from massive amounts of general data, and then apply that knowledge to accomplish new tasks—a process called transfer learning. These systems have recently gained mainstream attention with the release of ChatGPT, a chatbot built on a model from OpenAI.

In the new workGladstone Assistant Investigator Christina Theodoris, MD, Ph.D., developed a foundation model for understanding how genes interact. The new model, dubbed Geneformer, learns from massive amounts of data on gene interactions from a broad range of human tissues and transfers this knowledge to make predictions about how things might go wrong in disease.

Theodoris and her team used Geneformer to shed light on how heart cells go awry in heart disease. This method, however, can tackle many other cell types and diseases too.

"Geneformer has vast applications across many areas of biology, including discovering possible drug targets for disease," says Theodoris, who is also an assistant professor in the Department of Pediatrics at UC San Francisco. "This approach will greatly advance our ability to design network-correcting therapies in diseases where progress has been obstructed by limited data."

Theodoris designed Geneformer during a postdoctoral fellowship with X. Shirley Liu, Ph.D., former director of the Center for Functional Cancer Epigenetics at Dana-Farber Cancer Institute, and Patrick Ellinor, MD, Ph.D., director of the Cardiovascular Disease Initiative at the Broad Institute—both authors of the new study.

A Network View

Many genes, when active, set off cascades of molecular activity that trigger other genes to dial their activity up or down. Some of those genes, in turn, impact other genes—or loop back and put the brakes on the first gene. So, when a scientist sketches out the connections between a few dozen related genes, the resulting network map often looks like a tangled spiderweb.

If mapping out just a handful of genes in this way is messy, trying to understand connections between all 20,000 genes in the human genome is a formidable challenge. But such a massive network map would offer researchers insight into how entire networks of genes change with disease, and how to reverse those changes.

"If a drug targets a gene that is peripheral within the network, it might have a small impact on how cell functions or only manage the symptoms of a disease," says Theodoris. "But by restoring the normal levels of genes that play a central role in the network, you can treat the underlying disease process and have a much larger impact."

Artificial Intelligence "Transfer Learning"

Typically, to map gene networks, researchers rely on huge datasets that include many similar cells. They use a subset of AI systems, called machine learning platforms, to work out patterns within the data. For example, a machine learning algorithm could be trained on a large number of samples from patients with and without heart disease, and then learn the gene network patterns that differentiate diseased samples from healthy ones.

However, standard machine learning models in biology are trained to only accomplish a single task. For the models to accomplish a different task, they have to be retrained from scratch on new data. So, if researchers from the first example now wanted to identify diseased kidney, lung, or brain cells from their healthy counterparts, they'd need to start over and train a new algorithm with data from those tissues.

The issue is that, for some diseases, there isn't enough existing data to train these machine-learning models.

In the new study, Theodoris, Ellinor, and their colleagues tackled this problem by leveraging a machine learning technique called "transfer learning" to train Geneformer as a foundational model whose core knowledge can be transferred to new tasks.

First, they "pretrained" Geneformer to have a fundamental understanding of how genes interact by feeding it data about the activity level of genes in about 30 million cells from a broad range of human tissues.

To demonstrate that the transfer learning approach was working, the scientists then fine-tuned Geneformer to make predictions about the connections between genes, or whether reducing the levels of certain genes would cause disease. Geneformer was able to make these predictions with much higher accuracy than alternative approaches because of the fundamental knowledge it gained during the pretraining process.

In addition, Geneformer was able to make accurate predictions even when only shown a very small number of examples of relevant data.

"This means Geneformer could be applied to make predictions in diseases where research progress has been slow because we don't have access to sufficiently large datasets, such as rare diseases and those affecting tissues that are difficult to sample in the clinic," says Theodoris.

Lessons for Heart Disease

Theodoris's team next set out to use transfer learning to advance discoveries in heart disease. They first asked Geneformer to predict which genes would have a detrimental effect on the development of cardiomyocytes, the muscle cells in the heart.

Among the top genes identified by the model, many had already been associated with heart disease.

"The fact that the model predicted genes that we already knew were really important for heart disease gave us additional confidence that it was able to make accurate predictions," says Theodoris.

However, other potentially important genes identified by Geneformer had not been previously associated with heart disease, such as the gene TEAD4. And when the researchers removed TEAD4 from cardiomyocytes in the lab, the cells were no longer able to beat as robustly as healthy cells.

Therefore, Geneformer used transfer learning to make a new conclusion: even though it had not been fed any information on cells lacking TEAD4, it correctly predicted the important role that TEAD4 plays in cardiomyocyte function.

Finally, the group asked Geneformer to predict which genes should be targeted to make diseased cardiomyocytes resemble healthy cells at a gene network level. When the researchers tested two of the proposed targets in cells affected by cardiomyopathy (a disease of the heart muscle), they indeed found that removing the predicted genes using CRISPR gene editing technology restored the beating ability of diseased cardiomyocytes.

"In the course of learning what a normal gene network looks like and what a diseased gene network looks like, Geneformer was able to figure out what features can be targeted to switch between the healthy and diseased states," says Theodoris. "The transfer learning approach allowed us to overcome the challenge of limited patient data to efficiently identify possible proteins to target with drugs in diseased cells."

"A benefit of using Geneformer was the ability to predict which genes could help to switch cells between healthy and disease states," says Ellinor. "We were able to validate these predictions in cardiomyocytes in our laboratory at the Broad Institute."

The researchers are planning to expand the number and types of cells that Geneformer has analyzed to keep boosting its ability to analyze gene networks. They've also made the model open-source so that other scientists can use it."With standard approaches, you have to retrain a model from scratch for every new application," says Theodoris. "The really exciting thing about our approach is that Geneformer's fundamental knowledge about gene networks can now be transferred to answer many biological questions, and we're looking forward to seeing what other people do with it."

Modeling by UMass Amherst researcher demonstrates strategy is a win-win for combating climate change

Carbon markets have become a critical policy tool to combat climate change. They allow firms that emit greenhouse gases to buy and sell the right to pollute, which gives the firms flexibility while also reducing carbon emissions at the lowest cost. A patchwork of dozens of markets exists around the world, often with drastically different prices for carbon credits. In a new paper, a University of Massachusetts Amherst resource economist demonstrates that linking fragmented carbon markets with an exchange rate has the potential to be a significant step toward forming a global climate policy.

Matt Woerman, assistant professor of resource economics at UMass Amherst, explores linking carbon markets using an allowance exchange rate, which denominates the compliance value of an emissions allowance differently in each program. Using simulation modeling, he finds that while an exchange rate may reduce emissions abatement in certain programs, it achieves greater emissions reductions and cost efficiencies overall. Matt Woerman, assistant professor of resource economics

“Climate change is a global problem,” Woerman says. “Linking carbon markets with an allowance exchange rate is a great step toward a larger global climate policy that we need to solve climate change.”

The modeling indicates that an exchange rate among linked carbon markets in various regions would move prices closer together and reduce pollution.

“This suggests that both regions would win, and the environment wins,” Woerman says.

One potential downside of linking carbon markets is that each participating jurisdiction would give up a small amount of its sovereignty, but Woerman’s research finds that the exchange rate would act as a cushion of sorts.

“It’s this extra lever that allows policymakers to still retain some of that sovereignty and not force everything to be equal across the linked markets,” he notes.

Woerman hopes the findings can be used to build political momentum toward forming larger coalitions to trade carbon credits.

“The next step is to think about how this work fits into a more dynamic framework in the longer run, particularly how we can use the allowance exchange rate perhaps as a first step toward a global market,” Woerman says.

The paper is based on a previous manuscript coauthored by Dallas Burtraw and Karen Palmer of Resources for the Future and Clayton Munnings, a U.S. strategic advisor for the International Emissions Trading Association. The research was supported by Mistra Carbon Exit and the nonprofit research institution Resources for the Future Electric Power Program.

The paper appears in the Journal of Environmental Economics and Management and is available at https://www.sciencedirect.com/science/article/pii/S0095069623000384.

China's Institute of Oceanology uses deep learning for global estimation of phytoplankton pigment concentrations

The phytoplankton community structure can reflect changes in the marine environment and help us understand the driving factors behind ecological evolution. The quantifying pigment concentration in phytoplankton is crucial for a comprehensive assessment of taxonomic classification and community structure.

Recently, a research team led by Prof. LI Xiaofeng from the Institute of Oceanology of the Chinese Academy of Sciences (IOCAS) has made progress in the inversion of global phytoplankton pigment concentrations using deep learning algorithms. Using satellite data, they developed a deep-learning-based model (DL-PPCE model) for estimating concentrations of 17 different phytoplankton pigments globally.

The study was published in Remote Sensing of Environment on May 19.

The model inputs include ocean color parameters, satellite-derived environmental parameters, and the slope of above-surface remote-sensing reflectance. The model was validated against high-performance liquid chromatography (HPLC) data and was advantageous for analyzing the phytoplankton community dynamics on a large spatiotemporal scale.

Using the established DL-PPCE model, the researchers conducted a time series analysis of global pigment concentrations retrieved by Moderate-resolution Imaging Spectroradiometer (MODIS) during the period of 2003-2021. They found that the prokaryotes-dominated area extended eastward from180°E to 150°W during the 2015/2016 El Nino event. From 2003 to 2021, prokaryotic abundance was positively correlated with El Nino intensity but negatively correlated with the quantity of the entire phytoplankton community.

Ocean color remote sensing enables the retrieval of phytoplankton absorption, which is directly linked to pigment concentration. "However, the simultaneous retrieval of multiple pigment concentrations globally is challenging due to optical property variability in seawater and the packaging effect on phytoplankton absorption," said LI Xiaolong, the first author of the study.

"In our study, we employ a novel approach to estimate global phytoplankton pigment concentrations," said Prof. LI, corresponding author of the study. "By avoiding assumptions about pigment absorption spectra and employing deep learning, we established non-linear relationships between remote sensing variables and phytoplankton pigment concentrations. This approach yielded high accuracy in estimating pigment concentrations."

Breakthrough Listen / Danielle Futselaar An artist's conception of an alien device that generates repetitive signals.
Breakthrough Listen / Danielle Futselaar An artist's conception of an alien device that generates repetitive signals.

Cornell develops software based on a FFA that offers new way to listen for signals from the stars

The Breakthrough Listen Investigation for Periodic Spectral Signals (BLIPSS), led by Akshay Suresh, Cornell doctoral candidate in astronomy, is pioneering a search for periodic signals emanating from the core of our galaxy, the Milky Way. The research aims to detect repetitive patterns, a way to search for extraterrestrial intelligence (SETI) within our cosmic neighborhood. 

The researchers developed software based on a Fast Folding Algorithm (FFA), an efficient search method offering enhanced sensitivity to periodic sequences of narrow pulses. Their paper, “A 4–8 GHz Galactic Center Search for Periodic Technosignatures,” was published May 30 in The Astronomical Journal.

Pulsars -- rapidly rotating neutron stars that sweep beams of radio energy across the Earth -- are natural astrophysical objects that generate periodic signals but humans also use directed periodic transmissions for a variety of applications, including radar. Such signals would be a good way to get someone’s attention across interstellar space, standing out from the background of non-periodic signals, as well as using much less energy than a transmitter that is broadcasting continuously.

“BLIPSS is an example of cutting-edge software as a science multiplier for SETI,” said Suresh. “Our study introduces to SETI, for the first time, the Fast Folding Algorithm; our open-source software utilizes an FFA to crunch over 1.5 million time series for periodic signals in roughly 30 minutes.”

BLIPSS is a collaborative effort between Cornell, the SETI Institute, and Breakthrough Listen. The project significantly enhances the probability of capturing evidence of extraterrestrial technology by focusing on the central region of the Milky Way, known for its dense concentration of stars and potentially habitable exoplanets. The center of the Milky Way would also be an ideal place for aliens to place a beacon to contact large swaths of the Galaxy.

The team tested their algorithm on known pulsars and were able to detect periodic emissions1. Use simpler language: While the text is informative and technical, some of the language used may be difficult for the average reader to understand. To improve the effectiveness of the writing, it may be helpful to use simpler language to explain concepts and avoid jargon.

2. Provide more context: While the text provides some context, it may be helpful to provide more information on the significance of the research and its potential implications for the field of astronomy and the search for extraterrestrial life.

3. Include visuals: To make the text more engaging and easier to understand, it may be helpful to include visuals such as diagrams or illustrations to explain the concepts being discussed. This can help readers better understand the research and its potential implications. as expected. They then turned to a larger dataset of scans of the Galactic Center undertaken using the Breakthrough Listen instrument on the 100-meter Green Bank Telescope (GBT) in West Virginia. In contrast to pulsars, which emit across a wide swath of radio frequencies, BLIPSS looked for repeating signals in a narrower range of frequencies, covering less than one-tenth of the width of an average FM radio station.

“The combination of these relatively narrow bandwidths with periodic patterns could be indicative of deliberate technological activities of intelligent civilizations,” said co-author Steve Croft, Breakthrough Listen project scientist. “Breakthrough Listen captures huge volumes of data, and Akshay’s technique provides a new method to help us search that haystack for needles that could provide tantalizing evidence of advanced extraterrestrial life forms.”

“Until now, radio SETI has primarily dedicated its efforts to the search for continuous signals,” said co-author Vishal Gajjar, a SETI Institute astronomer. “Our study sheds light on the remarkable energy efficiency of a train of pulses as a means of interstellar communication across vast distances. Notably, this study marks the first-ever comprehensive endeavor to conduct in-depth searches for these signals.”

The multi-model mean of the yearly percentage of cropland experiencing flash drought over entire continents for the historical (black), SSP126 (blue), SSP245 (orange), and SSP585 (red) scenarios. A 30-year centered moving average is applied to each time series. The shaded regions indicate the variability (±1σ) among the 30-year centered moving averages between all six models for the corresponding historical and future scenarios.
The multi-model mean of the yearly percentage of cropland experiencing flash drought over entire continents for the historical (black), SSP126 (blue), SSP245 (orange), and SSP585 (red) scenarios. A 30-year centered moving average is applied to each time series. The shaded regions indicate the variability (±1σ) among the 30-year centered moving averages between all six models for the corresponding historical and future scenarios.

OU’s climate researcher Christian projects cropland risk from flash droughts using global climate models

The rapid development of unexpected drought, called flash drought, can severely impact agricultural and ecological systems with ripple effects that extend even further. Researchers at the University of Oklahoma are assessing how our warming climate will affect the frequency of flash droughts and the risk to croplands globally. Jordan Christian, a postdoctoral researcher, is the lead author of the study.

“In this study, projected changes in flash drought frequency and cropland risk from flash drought are quantified using global climate model simulations,” Christian said. “We find that flash drought occurrence is expected to increase globally among all scenarios, with the sharpest increases seen in scenarios with higher radiative forcing and greater fossil fuel usage.” A figure showing the impact of a flash drought on a grassland in Oklahoma. The photos on the top row show the impact of the flash drought on the ecosystem compared with photos of the same area without flash drought impacts (bottom row).

Radiative forcing describes the imbalance of radiation where more radiation enters Earth’s atmosphere than leaves it. Like burning fossil fuels, these activities are among the most significant contributors to climate warming. The changing climate is expected to increase severe weather events from storms, flash flooding, flash droughts, and more.

“Flash drought risk over cropland is expected to increase globally, with the largest increases projected across North America and Europe,” Christian said.

“CMIP6 models projected a 1.5 times increase in the annual risk of flash droughts over croplands across North America by 2100, from the 2015 baseline of a 32% yearly risk in 2015 to 49% in 2100, while Europe is expected to have the largest increase in the most extreme emissions scenario (32% to 53%), a 1.7 times increase in annual risk,” he said.

Jeffrey Basara, an associate professor in the School of Meteorology in the College of Atmospheric and Geographic Sciences and the School of Civil Engineering and Environmental Sciences in the Gallogly College of Engineering, is Christian’s faculty advisor and study co-author. Basara is the executive associate director of the hydrology and water security program and leads OU’s Climate, Hydrology, Ecosystems, and Weather research group. The researchers have been investigating ways to improve flash drought identification and prediction since 2017, with multiple papers published in journals.

“This study continues to emphasize that agricultural producers, both domestic and abroad, will face increasing risks associated with water availability due to the rapid development of drought. As a result, socioeconomic pressures associated with food production, including higher prices and social unrest, will also increase when crop losses occur due to flash drought,” Basara said.