AI, big data predict which research will influence future medical treatments

An artificial intelligence/machine learning model to predict which scientific advances are likely to eventually translate to the clinic has been developed by Ian Hutchins and colleagues in the Office of Portfolio Analysis (OPA), a team led by George Santangelo at the National Institutes of Health (NIH). This work, described in a Meta-Research article published October 10 in the open-access journal PLOS Biology, aims to decrease the sometimes decades-long interval between scientific discovery and clinical application; the method determines the likelihood that a research article will be cited by a future clinical trial or guideline, an early indicator of translational progress.

Hutchins and colleagues have quantified these predictions, which are highly accurate with as little as two years of post-publication data, as a novel metric called "Approximate Potential to Translate" (APT). APT values can be used by researchers and decision-makers to focus attention on areas of science that have strong signatures of translational potential. Although numbers alone should never be a substitute for evaluation by human experts, the APT metric has the potential to accelerate biomedical progress as one component of data-driven decision-making. CAPTION This image depicts the co-citation network of seminal fundamental publications that led to the clinical development of cancer immunotherapy treatments. Large dots (center) represent the most influential clinical trials that formed part of the evidence base for FDA approval of these treatments. Heat mapping indicates the extent to which the research was human-focused; at the extremes, each green dot represents a fundamental research publication and each red dot a publication describing human research. This network was generated using open access data from the new modules of the iCite webtool described in two new articles from Hutchins and colleagues.  CREDIT Ian Hutchins and George Santangelo{module In-article}

The model that computes APT values makes predictions based upon the content of research articles and the articles that cite them. A long-standing barrier to research and development of metrics like APT is that such citation data has remained hidden behind proprietary, restrictive, and often costly licensing agreements. To disrupt this impediment to the scientific community, to increase transparency, and to facilitate reproducibility, OPA has aggregated citation data from publicly available resources to create an open citation collection (NIH-OCC), the details of which appear in a Community Page article in the same issue of PLOS Biology. The NIH-OCC comprises over 420 million citation links at present and will be updated monthly as citations continue to accumulate. For publications since 2010, the NIH-OCC is already more comprehensive than leading proprietary sources of citation data.

Citation data from the NIH-OCC are used to calculate both APT values and Relative Citation Ratios (RCRs). The latter, a measure of scientific influence at the article level, normalized for the field of study and time since publication, was developed previously by Santangelo's team at NIH, and has already been widely adopted in both the scientific and evaluator communities. Upon publication of these two articles, APT values and the NIH-OCC will be freely and publicly available as new components of the iCite webtool that will continue as the primary source of RCR data (https://icite.od.nih.gov/). The OPA team encourages the use of iCite to improve research assessment and decision-making that can contribute to optimizing the scientific enterprise.