BIG DATA
Data lost at alarming rate
- Written by: Tyler O'Neal, Staff Editor
- Category: BIG DATA
Eighty per cent of scientific data are lost within two decades, according to a new study that tracks the accessibility of data over time.
The culprits? Old e-mail addresses and obsolete storage devices.
"Publicly funded science generates an extraordinary amount of data each year," says Tim Vines, a visiting scholar at the University of British Columbia. "Much of these data are unique to a time and place, and is thus irreplaceable, and many other datasets are expensive to regenerate.
"The current system of leaving data with authors means that almost all of it is lost over time, unavailable for validation of the original results or to use for entirely new purposes."
For the analysis, published today in Current Biology, Vines and colleagues attempted to collect original research data from a random set of 516 studies published between 1991 and 2011. They found that while all datasets were available two years after publication, the odds of obtaining the underlying data dropped by 17 per cent per year after that.
"I don't think anybody expects to easily obtain data from a 50-year-old paper, but to find that almost all the datasets are gone at 20 years was a bit of a surprise."
Vines is calling on scientific journals to require authors to upload data onto public archives as a condition for publication, adding that papers with readily accessible data are more valuable for society and thus should get priority for publication.
"Losing data is a waste of research funds and it limits how we can do science," says Vines. "Concerted action is needed to ensure it is saved for future research."