ACADEMIA
With Shared Memory in Pittsburgh, XSEDE Expands Horizons for Supercomputer Research
Times are changing for supercomputer (high-performance computing) research, as non-traditional fields of study have begun taking advantage of powerful supercomputer tools. This was part of the plan when the National Science Foundation’s XSEDE (Extreme Science and Engineering Discovery Environment) program launched in July 2011. In recent months, the program took big steps toward this objective, in that a number of non-traditional projects — the common denominator being the need to process and analyze large amounts of data — were awarded peer-reviewed allocations of time on XSEDE resources.
“We’re happy to see these proposals succeed,” said Sergiu Sanielevici of the Pittsburgh Supercomputing Center (PSC), who leads XSEDE’s program in Novel and Innovative Projects (NIP). “As a brand new initiative in XSEDE, the NIP team proactively stimulates the development of strong projects that focus on interesting kinds of research that differ from the more typical simulation and modeling applications that have dominated HPC research in previous decades.”
The projects Sanielevici refers to involve:
• Creating searchable access to hand-written census data going back to the 1940s (Kenton McHenry, University of Illinois at Urbana-Champaign),
• Analyzing huge quantities of finance-trading data, the volume of which has rapidly increased beyond the computational power of prior approaches to trade-data research (Mao Ye, University of Illinois at Urbana-Champaign),
• Assembling DNA segments from fungi in soil to identify new enzymes that can cost-effectively convert plant material to biofuel (Mostafa Elshahed, Oklahoma State University), and
• Simulating the WorldWideWeb to discern which of many proposed protocols to make the Internet secure works best against various kinds of attacks (Sharon Goldberg, Boston University),
• Applying sophisticated “machine learning” algorithms to discern meaning from huge amounts of online text data (Noah Smith, Carnegie Mellon University).
All of these non-traditional supercomputer projects sought and received large allocations on XSEDE’s Blacklight resource at PSC, an SGI Altix UV1000 system partitioned into two connected 16-terabyte shared-memory systems, the two largest shared-memory systems in the world.
Shared-memory resources such as Blacklight present a large advantage for many data-intensive applications, says Sanielevici, because of “efficient fine-grained random access” — all of the system’s memory can be directly accessed from all of its processors, as opposed to distributed memory (in which each processor’s memory is directly accessed only by that processor). Because all processors share a single view of data, a shared-memory system is, relatively speaking, easy to program and use.
Elshahed’s project also received allocations on XSEDE’s Forge system at the National Center for Supercomputing Applications, and Ye’s project also received an allocation on the new Gordon system at the San Diego Supercomputer Center. McHenry’s and Ye’s projects were recommended to receive extended collaborative support from XSEDE staff experts, subject to the formulation of project plans.
Blacklight has proven to be especially useful in genomics assembly projects, such as Elshahed’s, as highlighted in a recent article in the bioinformatics journal GenomeWeb [http://psc.edu/publicinfo/pdf/GenomeWeb_BioInformatics_021312.pdf]stating that “genomics researchers might be hard pressed to do better . . . .”
Brian Couger of Oklahoma State, a Ph.D. candidate working with Elshahed, recently completed a “metagenomics” assembly on Blacklight of 3 billion DNA segments, which he believes is the largest metagenomics assembly accomplished to date, and couldn’t have been done, said Couger, on systems other than Blacklight.
Mao Ye’s work with Blacklight has examined how lack of transparency in certain kinds of finance trading can skew the market. Because of the quantity of data involved, the problem is very difficult to analyze. He notes that it took several months for the U.S. Securities and Exchange Commission to analyze just two hours of trade data and that Blacklight has greatly improved the ability to produce timely analysis.
With Blacklight, Noah Smith’s research group at CMU has been able to “train” large-scale semantic models of natural language using millions to billions of words. Blacklight’s parallelism and shared memory allows Smith and collaborators to take advantage of much more powerful and complex algorithms than have been applied before to such large datasets.
XSEDE, the Extreme Science and Engineering Discovery Environment, is the most advanced, powerful, and robust collection of integrated digital resources and services in the world. It is a single virtual system that scientists and researchers can use to interactively share computing resources, data, and expertise. XSEDE integrates the resources and services, makes them easier to use, and helps more people use them. The five-year, $121 million project is supported by the National Science Foundation, and it replaces and expands on the NSF TeraGrid project.