SCIENCE
SDSC deploys a new Appro supercomputer to accelerate the production of research results
- Category: SCIENCE
New Data-Intensive Supercomputer Targets High Productivity for Users
Trestles is available to users of the TeraGrid, the nation’s largest open-access scientific discovery infrastructure. The system is among the five largest in the TeraGrid repertoire, with 10,368 processor cores, a peak speed of 100 teraflop/s, 20 terabytes memory, and 38 terabytes of flash memory. One teraflop (TF) equals a trillion calculations per second, while one terabyte (TB) equals one trillion bytes of information.
“Trestles is appropriately named because it will serve as a bridge between SDSC’s unique, data-intensive resources available to a wide community of users both now and into the future,” said Michael Norman, SDSC’s director.
Configured by SDSC and Appro, Trestles is based on quad-socket, 8-core AMD Magny-Cours compute nodes connected via a QDR InfiniBand fabric. Each of its 324 nodes has 32 cores, 64 gigabytes (GB) of memory, and 120 GB of flash memory. Debuting at #111 on the top 500 list of supercomputers in the latest ranking, Trestles will work with and span the deployments of SDSC’s recently introduced Dash system and a larger data-intensive system named Gordon, to become operational in late 2011.
All three SDSC systems employ flash-based memory, which is common in much smaller devices such as mobile phones and laptop computers but unique for supercomputers, which generally use slower spinning disk technology.
“UCSD and SDSC are pioneering the use of flash in high-performance computing,” said Allan Snavely, associate director of SDSC and a co-PI for the new system. “Flash disks read data as much as 100 times faster than spinning disk, write data faster, and are more energy-efficient and reliable.”
“Trestles, as well as Dash and Gordon, were designed with one goal in mind, and that is to enable as much productive science as possible as we enter a data-intensive era of computing,” said Richard Moore, SDSC’s deputy director and co-PI. “Today’s researchers are faced with sifting through tremendous amounts of digitally based data, and such data-intensive resources will give them the tools they need to do so.”
Moore added that that Trestles offers modest-scale and gateway users rapid job turnaround to increase researcher productivity, while also being able to host long-running jobs. Speaking of speed, SDSC and Appro brought Trestlesinto production in less than 10 weeks from initial hardware delivery. “We committed to getting the system in the hands of our users and meeting NSF’s production deadline,” noted Moore.
Early User Successes
Early users of SDSC’s Trestles include Bridget Carragher and Clint Potter, directors at the National Resource for Automated Molecular Microscopy at The Scripps Research Institute in La Jolla, Calif. Their project focuses on establishing a portal on the TeraGrid for structural biology researchers to facilitate electron microscopy (EM) image processing using the Appion pipeline, an integrated, database-driven system.
"We are very excited about this early opportunity to use the Trestles infrastructure for high performance structural biology projects,” said Carragher. “Based on our initial experience, we are optimistic that this system will have a dramatic impact on the scale of projects we can undertake, and on the resolution that can be achieved for macromolecular structure.”
Another early user is Ross Walker, an adjunct assistant professor of chemistry at UC San Diego and an assistant research professor with SDSC specializing in computational chemistry. “Typically, computational chemists need only a moderate number of cores, between 128 and 512, for longer periods of time,” he said. “This is exactly what Trestles was designed to offer.”
Walker’s group recently ran some simulations of the Adenovirus Protease, a key enzyme in Adenovirus replication and an interesting drug target for severe upper respiratory and stomach infections which now have no remedy other than aspirin or some other anti-inflammatory.
Those calculations ran on 512 cores each, and the group was able to leave them running on Trestles almost unattended for two weeks. “Such 'hands-off' supercomputing greatly increases the productivity of my research team,” noted Walker.
TeraGrid User-Friendly
To ensure that productivity on Trestles remains high, SDSC will adjust allocation policies, queuing structures, user documentation, and training based on a quarterly review of usage metrics and user satisfaction data.Trestles, along with SDSC’s Dash and Triton Resource clusters use a matrixed pool of expertise in system administration and user support, as well as the SDSC-developed Rocks cluster management software. SDSC’s Advanced User Support has already established key benchmarks to accelerate user applications, and subsequently will assist users in tuning and optimizing applications for Trestles. Full details of the new system can be found on theTrestles webpage.
Walker’s team also recently ran a significant number of quantum geometry optimizations in support of a new force field it is developing for molecular dynamics, taking advantage of Trestles’ generous amount of memory and symmetric multiprocessing (SMP) cores, along with its streamlined scheduler policy. “We were able to get these runs completed in only a few days onTrestles.”
Trestles’ size, allocation range, and scheduling practices are expected to also benefit the emerging Science Gateway paradigm for high-performance computing system access. Science gateways are a relatively recent phenomenon in supercomputing. Currently led by Nancy Wilkins-Diehr of SDSC, the TeraGrid Gateway program began in 2004 as web portals designed and used by scientists. The program extends the analysis capabilities of these community-designed interfaces through the use of supercomputers, yet insulates users from supercomputing complexities.
During the final quarter of 2010, gateway users represented 42% of all researchers who ran jobs on the TeraGrid during that period, reflecting a steady growth in the number of users accessing high-end resources. Trestles’policies are designed to meet the needs of that increasing user base.
NSF’s award to build and deploy Trestles was announced last August by SDSC, and Trestles will be available to TeraGrid users through 2013. In November 2009, SDSC announced a five-year, $20 million grant from the NSF to build and operate Gordon, the first high-performance supercomputer to employ a vast amount of flash memory. Dash, a smaller prototype ofGordon, was deployed in April 2010. All these systems are being integrated by Appro and use a similar design philosophy of combining commodity parts in innovative ways to achieve high-performance architectures.