PROCESSORS
A Portal to Science Education
WASHINGTON, DC -- The National Science Foundation is on a quest to build the ultimate Internet portal for science and technology students and educators-a digital library impressive not only for its size, but also its usefulness. A key component of this effort to build the National Science, Mathematics, Engineering and Technology Education Digital Library, or NSDL, will be the development of a system for cataloging terabytes of data to support sophisticated search capabilities. Coordinators say the NSDL will be able to return content that has been reviewed for accuracy and analyzed for true relevance to a query. This will enable users to more easily find the exact resources they need within the immense collection.
“There is no lack of great piles of content on the Web, but there is an urgent need for piles of great content,” says Lee Zia, lead program director of the NSDL effort.
To fulfill the need for great content, a digital library’s builders must develop reliable methods for collating many types of data, including text, images, sounds and applications. At the base of such a monumental undertaking, says Zia, is the development of a system for acquiring and organizing metadata, or data about data.
A New Age, but an Old Problem
Consider the old card catalogs in your local library. Stuffed with index cards, they were hard to maintain, inflexible and unwieldy to use as library collections grew.
Thanks in part to constant advances in storage technology, its now possible to store millions of items in a single digital library. And high-bandwidth networks have dramatically broadened access to existing material. But as the scale of libraries has grown with digitization, so has the problem of organizing collections.
That’s why the National Science Foundation is focusing on the cataloging issue as an integral part of the NSDL program, which builds on the research efforts of the U.S. government’s multi-agency Digital Libraries Initiative launched in 1994.
“It’s more than just another website,” says Zia of the NSDL project. “We’re essentially trying to tame the Web. We want anyone who uses the portal to be able to find the most up-to-date, relevant information about whatever topic they’re researching or learning or teaching about.”
For example, suppose you were to search for the latest studies about teaching math to 10-year-olds. A standard Internet search engine might provide the URL of a potentially interesting Web site, along with many more sites that are not pertinent but still use similar terms. The NSDL system would instead return an intelligently filtered list of references. It also would provide information about those sources-such as peer reviews and citation lists-making it easier to determine which materials are worth further exploration.
“We want the NSDL to help you understand the information you find,” says Zia. “That way, collections can be that much richer.”
The Challenge of Organization
For a digital library to be truly useful, librarians need to answer two key questions: How to decide what metadata to store, and how to organize it so that the content is retrievable in a useful way?
One institution looking at these issues is Cornell University, which was awarded more than $1.5 million this year by the National Science Foundation to help provide core integration capabilities for the NSDL program.
William Arms, professor of computer science at Cornell, says computers can directly analyze original materials, including books, maps and other records that have been digitized. Traditional cataloging methods would make it all but impossible to render real usefulness to the staggering amounts of data in libraries like the NSDL.
“Traditional methods just aren’t scalable,” he says. “Building a system with standards-like the old catalog files-takes years. Both the content and the content’s formats are so dynamic; information changes, and technology evolves. They must be updated so frequently that compiling the information by hand will be too expensive and slow.”
A Commitment to Research
Cornell is among several institutions to receive grants for research supporting the digital library effort. Reflecting its commitment to the NSDL, the National Science Foundation is providing nearly $39 million in funds for projects in four tracks: collections, services, core integration and targeted research.
According to Zia, the fundamental task of collections projects is to identify and aggregate high-quality learning materials and resources, while services projects focus on developing tools and applications that improve a user’s ability to exploit the library’s contents. Under core integration are projects providing technology and educational community involvement to support the interaction of collections and services. Targeted research further supports these efforts through a variety of activities, such as user studies or targeted evaluation studies.
“Although collections is about the material itself, core integration is the organizational glue that binds distributed collections and different users together,” Zia explains. “The information must be updated and transmitted across all platforms. Eventually, you’re talking about translating it into different languages. The idea is that anyone, anywhere, can find and use this information.”
Six of the 29 projects begun in 2000 were devoted to pilot efforts in core integration, with grants going to institutions including Cornell, the University of Missouri-Columbia and the University of California, Berkeley.
Coordinators hope to have an operational version of the NSDL up and running by the fall of 2002. But, according to Arms, it could take 10 to 20 years to develop a portal such as the NSDL to its full potential. He adds that the foundation for such a website is within our technological means now, if “brute force computing” is used to manage and interpret information. But it will take time to perfect the technology behind the ultimate digital card catalog.