ACADEMIA
Information overload in the era of big data
New search tools give scientists better ways of managing data
Botany is plagued by the same problem as the rest of science and society: our ability to generate data quickly and cheaply is surpassing our ability to access and analyze it. In this age of big data, scientists facing too much information rely on supercomputers to search large data sets for patterns that are beyond the capability of humans to recognize—but supercomputers can only interpret data based on the strict set of rules in their programming.
New tools called ontologies provide the rules supercomputers need to transform information into knowledge, by attaching meaning to data, thereby making those data retrievable by computers and more understandable to human beings. Ontology, from the Greek word for the study of being or existence, traditionally falls within the purview of philosophy, but the term is now used by computer and information scientists to describe a strategy for representing knowledge in a consistent fashion. An ontology in this contemporary sense is a description of the types of entities within a given domain and the relationships among them.
A new article in this month's American Journal of Botany by Ramona Walls (New York Botanical Garden) and colleagues describes how scientists build ontologies such as the Plant Ontology (PO) and how these tools can transform plant science by facilitating new ways of gathering and exploring data.
When data from many divergent sources, such as data about some specific plant organ, are associated or "tagged" with particular terms from a single ontology or set of interrelated ontologies, the data become easier to find, and computers can use the logical relationships in the ontologies to correctly combine the information from the different databases. Moreover, supercomputers can also use ontologies to aggregate data associated with the different subclasses or parts of entities.
For example, suppose a researcher is searching online for all examples of gene expression in a leaf. Any botanist performing this search would include experiments that described gene expression in petioles and midribs or in a frond. However, a search engine would not know that it needs to include these terms in its search—unless it was told that a frond is a type of leaf, and that every petiole and every midrib are parts of some leaf. It is this information that ontologies provide.
The article in the American Journal of Botany by Walls and colleagues describes what ontologies are, why they are relevant to plant science, and some of the basic principles of ontology development. It includes an overview of the ontologies that are relevant to botany, with a more detailed description of the PO and the challenges of building an ontology that covers all green plants. The article also describes four keys areas of plant science that could benefit from the use of ontologies: (1) comparative genetics, genomics, phenomics, and development; (2) taxonomy and systematics; (3) semantic applications; and (4) education. Although most of the examples in this article are drawn from plant science, the principles could apply to any group of organisms, and the article should be of interest to zoologists as well.
As genomic and phenomic data become available for more species, many different research groups are embarking on the annotation of their data and images with ontology terms. At the same time, cross-species queries are becoming more common, causing more researchers in plant science to turn to ontologies. Ontology developers are working with the scientists who generate data to make sure ontologies accurately reflect current science, and with database developers and publishers to find ways to make it easier for scientist to associate their data with ontologies.