Montana State computer scientists help expand horizon of genetics research

A tweaked gene or two among the millions or even billions of proteins that make up an organism's DNA are often all that distinguish the drought-tolerant plant or the person pre-disposed to cancer.

That's why a better understanding of genetic variation within a species could, among other things, help improve the selection of crops for local conditions and detection of disease, according to Joann Mudge, a senior research scientist at the nonprofit National Center for Genome Resources.

A generation ago, recording an organism's DNA from beginning to end was so laborious and expensive that scientists celebrated when they completed the task for a single bacterium. But as genome sequencing becomes faster and cheaper, scientists increasingly have access to insights about which genes do what, Mudge said.

"We're sequencing multiple individuals of some species," including plants and other complex organisms, Mudge said. That allows scientists to begin to sort out which segments of DNA from a species' core genome and which correspond to traits shared by only some individuals, she said. CAPTION Montana State computer science professor Brendan Mumey, right, and assistant professor Indika Kahanda, guide graduate students Lucia Williams and Buwani Manuweera through coding as part of the pangenomics project on May 16, 2019. CREDIT MSU Photo by Adrian Sanchez-Gonzalez {module In-article}

But the growing field of pangenomics, as it is called, presents a major analytical challenge. That's why NCGR recently partnered with Montana State University computer scientists to develop software that can compare multiple genomes and make sense of the results. The project is backed by a three-year, $662,000 grant from the National Science Foundation.

"We've been very happy with the way it's working," said Brendan Mumey, a professor in the Gianforte School of Computing in MSU's Norm Asbjornson College of Engineering. He and Mudge are co-leading the project.

According to Mumey, previously available software struggled with analyzing pangenomes for relatively primitive organisms such as the common yeast Saccharomyces cerevisiae, whose genome contains only 12 million of the DNA units known as base pairs. (By comparison, the human genome contains 3 billion base pairs.) Among the known strains of the yeast, minor genetic variations account for physical adaptations such as the ability of brewer's yeast to survive alcohol during the making of beer and wine.

"It's a classic 'big data' problem," Mumey said, referring to the field of supercomputing that deals with exceptionally large and complex data sets.

MSU assistant professor of computer science Indika Kahanda, a member of the research team, specializes in developing the "machine learning" models that help the new software adjust its gene-sorting analysis according to input from scientists. That approach has helped the team, which includes NCGR research scientist Thiru Ramaraj, identify genes of interest in a yeast pangenome that includes roughly 100 strains. Ramaraj earned his doctorate in computer science in 2010 at MSU, where Mumey was his adviser.

Mumey said the researchers' next step is to continue to refine the software so it can handle larger and more complex genomes, such as those of plants. The computational techniques being used "are still in their infancy," he said.

Eventually, pangenomics could help medical professionals diagnose a variety of diseases that have a genetic component, Mudge said. Most inherited breast cancer can be traced to mutations in just two genes, but other genetic diseases are thought to stem from more complex changes across larger areas of DNA.

The improved pangenomics tool is already helping scientists break out of a mold of comparing genomes to a single, arbitrary reference, Mudge said. Instead, researchers can represent a species' entire genome with all its nuance and variety.

"It's a hard problem to solve," Mudge said. "This has been a great collaboration."

Montana State computer scientists help expand horizon of genetics research

Supercomputers reveal dangerous stress buildup beneath Southern California

From Euro 2024 to World Cup 2026: How supercomputers are turning soccer into a computational science

AI, high-performance computing bring precision brain cancer diagnosis within reach

The next challenge for supercomputing isn’t faster AI, it’s public trust

Supercomputers trace a cosmic chain reaction from primordial black holes to the elements of life

Supercomputers challenge the origin story of cosmic explosions

The mathematical breakthrough that could free millions of supercomputer hours

IBM’s sub-1 nanometer chip breakthrough: A genuine revolution, or another semiconductor science project?

How HPC is connecting natural fusion in thunderstorms to the future of clean energy

Meta’s next frontier may not be social media; it may be supercomputing

Russian scientists make multimodal AI breakthrough in protein interaction prediction

Intel, Google's latest AI pact: A boost for supercomputing, or a strategic rebrand?

POPULAR RIGHT NOW