ACADEMIA
Cornell software captures diversity of microbial life
We live in a microbial world, with microscopic organisms filling discrete ecosystems in such environments as soil, lakes, skin, and even computer keyboards. Bacteria, for example, comprise most of the Earth's biomass dominating ecological functions such as carbon cycling, greenhouse gas emission and oxygen production. Ninety per cent of the cells in a human body are bacterial, as are 99% of the gene transcripts. Most of the microbial world has been inaccessible to us, a kind of biological “dark matter,” since we do not know how to culture over 97% of all bacteria, and microbial survey techniques to date have had significant limitations.
“All that is changing today, thanks to next-generation sequencing technologies that enable high-throughput microbial sampling,” said John Bunge, an associate professor in Cornell’s Department of Statistical Science who holds a joint appointment with Computing and Information Science. “The massive data produced by sequencers require advanced statistical tools capable of accurately estimating the total diversity or ‘species richness’ in a microbial population,” he explained.
Recognizing the need to continually improve software that employs modern, computationally intensive statistical analyses, Bunge and Linda Woodard, an ecological research specialist and database designer at the Cornell University Center for Advanced Computing, recently released version 3.0 of CatchAll. The program computes 12 different diversity estimates with standard errors and goodness-of-fit assessments, at every level of outlier deletion. It proposes a best overall parametric estimate along with a ranked set of alternatives. For cases where low-frequency counts may be erroneous CatchAll computes a discounted estimate by adjusting the diversity component of the selected mixture model. CatchAll is fast, platform-independent, computationally robust, and has both command line and GUI interfaces. An associated Excel spreadsheet automatically produces graphical displays.
Free executable downloads are available for Linux, Windows, and Mac OS platforms, with a manual and source code at www.northeastern.edu/catchall.
The software is featured in “Estimating population diversity with CatchAll,” published April 1 in the journal Bioinformatics by lead author John Bunge and Linda Woodard, Cornell University; Dankmar Bohning, professor and chair in medical statistics at the University of Southhampton; James Foster, professor of biological sciences at the University of Idaho; Sean Connolly, associate at Charles River Associates; and, Heather Allen, research scientist, USDA National Animal Disease Center Food Safety and Enteric Diseases Unit.
Besides working with scientists who are analyzing microbial diversity in areas such as marine life, Bunge collaborates on the analysis of other populations such as the diversity of lichens in redwood forest canopies.
The development of CatchAll was funded by the National Science Foundation.