ACADEMIA
Francis Crick Institute-built Taxonium online tool for evolutionary analysis shows SARS-CoV-2 variants converging
A new supercomputer online platform is helping scientists use massive datasets to track the evolution of SARS-CoV-2 and other viruses.
An analysis of massive amounts of genetic data on the SARS-CoV-2 virus suggests that COVID-19 variants worldwide are repeatedly evolving the same mutations, according to a study published today in eLife and carried out by researchers at the Francis Crick Institute.
The analysis was made possible by a new web-based tool called Taxonium that allows the analysis of reams of data collected by scientists around the globe to monitor the genetic trajectory of the virus. Taxonium can be used by scientists to monitor the evolution of SARS-CoV-2 and other viruses or organisms.
Scientists have long tracked the evolution of viruses. But the urgent situation created by the COVID-19 pandemic launched a massive global collaboration that collected and sequenced the genomes of 13 million SARS-CoV-2 samples – far more genetic data than had ever been generated before. However, most existing tools designed to trace viral evolution cannot handle that much data.
"We needed a new tool that would allow us to explore the family tree represented by these millions of SARS-CoV-2 genome sequences," says the study's author Theo Sanderson, a Sir Henry Wellcome Fellow at the Francis Crick Institute in London, UK.
To help, Sanderson built Taxonium – a free, web-based interface that allows scientists to analyze the genetic relationships between tens of millions of virus samples. Scientists can access and analyze the data through a website or a desktop app. Taxonium can help them search for viruses with specific genetic mutations, or in a particular location, and zoom in on large viral family trees to find the information they need.
Sanderson teamed up with scientists at the University of California, Santa Cruz, to build a SARS-CoV-2-specific version of Taxonium called Cov2Tree, which organizes publicly available data on more than six million SARS-CoV-2 sequences into evolutionary trees. Using the tool, the team tracked the recent evolution of the SARS-CoV-2 virus and found that many separate regions of the tree showed the acquisition of similar changes in the Spike protein. The analysis suggests that the same mutations are occurring again and again in different individuals around the world and are persisting.
"Scientists worldwide have used Cov2Tree to track the SARS-CoV-2 virus's evolution," says Sanderson. "But this application is probably just the start. Taxonium could be used to study the evolutionary tree of countless other viruses and bacteria."
Sanderson notes that Taxonium is just one part of a growing ecosystem of freely available online tools to help scientists manage what he calls the "avalanche of sequencing data". Scientists can use many of the tools together, and some of them have distinct features from Taxonium that may be better suited for specific tasks.
"With sequencing getting cheaper and cheaper, genetic sequence datasets as large as those created for SARS-CoV-2 are likely to become more common in the future," concludes Sanderson. "New tools to manage those datasets, like Taxonium, will be crucial to managing this new scale of data."