ACADEMIA
LBNL’s Evaluation of Earth Simulator Performance Nominated for Best Paper Award
With the re-emergence of viable vector computing systems such as the Earth Simulator and the Cray X1, there is renewed debate about which architecture is best suited for running large-scale scientific applications. In order to cut through the conflicting claims of fastest, biggest, etc., a team led by Lenny Oliker of the U.S. Department of Energy’s Lawrence Berkeley National Laboratory put five different systems through their paces, running four different scientific applications key to DOE research programs. As part of the effort, the group became the first international team to conduct a performance evaluation study of the 5,120-processor Earth Simulator. The team also assessed the performance of:
• the 6,080-processor Power3 IBM supercomputer running AIX 5.1 at the NERSC Center at Lawrence Berkeley National Laboratory
• the 864-processor Power4 IBM supercomputer running AIX 5.2 at Oak Ridge National Laboratory (ORNL)
• the 256-processor SGI Altix 3000 system running 64-bit Linux at ORNL
• the 512-processor Cray X1 supercomputer running UNICOS at ORNL.
The results of the comparison are of great interest to the HPC community. The team’s paper was accepted for the SC2004 conference, then nominated for Best Paper. The winning paper will be announced at the conference in November. Oliker will present his paper at SC2004 at 1:30 p.m. Tuesday, November 9, in Rooms 317-318 of the David Lawrence Convention Center in Pittsburgh, Pa..
In addition to Oliker, the team includes Julian Borrill, Andrew Canning, Jonathan Carter and John Shalf, all of LBNL, and Stephane Ethier of DOE’s Princeton Plasma Physics Laboratory.
“This effort relates to the fact that the gap between peak and actual performance for scientific codes keeps growing,” said Oliker, who won the Best Paper Award at SC99. "Because of the increasing cost and complexity of HPC systems, it is critical to determine which classes of applications are best suited for a given architecture.”
In their abstract, the group members write, “Computational scientists have seen a frustrating trend of stagnating application performance despite dramatic increases in the claimed peak capability of high performance computing systems. This trend has been widely attributed to the use of superscalar-based commodity components whose architectural designs offer a balance between memory performance, network capability, and execution rate that is poorly matched to the requirements of large-scale numerical computations.”
The four applications and research areas selected by the team for the evaluation are:
• Cactus, an astrophysics code that evolves Einstein’s equations from the Theory of Relativity using the Arnowitt-Deser-Misner method
• GTC, a magnetic fusion application that uses the particle-in-cell approach to solve non-linear gyrophase-averaged Vlasov-Poisson equations
• LBMHD, a plasma physics application that uses the Lattice-Boltzmann method to study magnetohydrodynamics
• PARATEC, a first principles materials science code that solves the Kohn-Sham equations of density-functional theory to obtain electronic wave functions.
So, what are the team’s conclusions?
“The four applications successfully ran on the Earth Simulator with high parallel efficiency,” Oliker said. “And they ran faster than on any other measured architecture -- - generally by a large margin.”
However, Oliker added, only codes that scale well and are suited to the vector architecture may be run on the Earth Simulator.
“Vector architectures are extremely powerful for the set of applications that map well to those architectures,” Oliker said. “But if even a small part of the code is not vectorized, overall performance degrades rapidly.”
As with most scientific inquiries, the ultimate solution to the problem is neither simple nor straightforward.
“We’re at a point where no single architecture is well suited to the full spectrum of scientific applications,” Oliker said. “One size does not fit all, so we need a range of systems. It’s conceivable that future supercomputers would have heterogeneous architectures within a single system, with different sections of a code running on different components.”
The team’s full paper can be found at http://crd.lbl.gov/~oliker/papers/SC04.pdf .
Berkeley Lab is a U.S. Department of Energy national laboratory located in Berkeley, California. It conducts unclassified scientific research and is managed by the University of California. Visit our website at http://www.lbl.gov .