ENGINEERING
Researchers Find Surprising Similarities Between Genetic, Computer Codes
Computational biologist Sergei Maslov of Brookhaven National Laboratory worked with graduate student Tin Yau Pang from Stony Brook University to compare the frequency with which components "survive" in two complex systems: bacterial genomes and operating systems on Linux supercomputers. Their work is published in the Proceedings of the National Academy of Sciences.
Maslov and Pang set out to determine not only why some specialized genes or computer programs are very common while others are fairly rare, but to see how many components in any system are so important that they can't be eliminated. "If a bacteria genome doesn't have a particular gene, it will be dead on arrival," Maslov said. "How many of those genes are there? The same goes for large software systems. They have multiple components that work together and the systems require just the right components working together to thrive.'"
Using data from the massive sequencing of bacterial genomes, now a part of the DOE Systems Biology Knowledgebase (KBase), Maslov and Pang examined the frequency of usage of crucial bits of genetic code in the metabolic processes of 500 bacterial species and found a surprising similarity with the frequency of installation of 200,000 Linux packages on more than 2 million individual computers. Linux is an open source software collaboration that allows designers to modify source code to create programs for public use.
The most frequently used components in both the biological and supercomputer systems are those that allow for the most descendants. That is, the more a component is relied upon by others, the more likely it is to be required for full functionality of a system.
It may seem logical, but the surprising part of this finding is how universal it is. "It is almost expected that the frequency of usage of any component is correlated with how many other components depend on it," said Maslov. "But we found that we can determine the number of crucial components – those without which other components couldn't function – by a simple calculation that holds true both in biological systems and computer systems."
For both the bacteria and the computing systems, take the square root of the interdependent components and you can find the number of key components that are so important that not a single other piece can get by without them.
Maslov's finding applies equally to these complex networks because they are both examples of open access systems with components that are independently installed. "Bacteria are the ultimate BitTorrents of biology," he said, referring to a popular file-sharing protocol. "They have this enormous common pool of genes that they are freely sharing with each other. Bacterial systems can easily add or remove genes from their genomes through what's called horizontal gene transfer, a kind of file sharing between bacteria," Maslov said.
The same goes for Linux operating systems, which allow free installation of components built and shared by a multitude of designers independently of one another. The theory wouldn't hold true for, say, a Windows operating system, which only runs proprietary programs.
Maslov is co-principal investigator in the KBase program, which is led by principal investigator Adam Arkin of DOE's Lawrence Berkeley National Laboratory, with additional co-principal investigators Rick Stevens of DOE's Argonne National Laboratory and Robert Cottingham of DOE's Oak Ridge National Laboratory. Supported by DOE's Office of Science, the KBase program provides a supercomputing environment that enables researchers to access, integrate, analyze and share large-scale genomic data to facilitate scientific collaboration and accelerate the pace of scientific discovery.
DOE's Office of Science is the single largest supporter of basic research in the physical sciences in the United States, and is working to address some of the most pressing challenges of our time. For more information, please visit science.energy.gov.