ACADEMIA
Reflecting chemical intuition
By J. William Bell, NCSA -- University of Illinois researcher Todd Martinez teams with NCSA to speed chemistry simulations and improve the way the results are stored and studied. On data analysis and mining projects, scientists often talk about saving themselves from "drowning in data." There's too much for a human to look at, too much for an entire research team to go over in the sum total of everyone's lifetime. But before a person can drown, someone has to turn on the water. Todd Martinez wants to open the floodgates and fashion the life preserver. Martinez is a chemistry professor at the University of Illinois and a 2007-2008 NCSA Faculty Fellow—not to mention a MacArthur Fellow. Martinez and his research group simulate the way fundamental chemical compounds change—from one structural form to another, for example—when hit by light. They look at these processes at the quantum and the atomic level, considering both how electrons behave within atoms and how atoms behave within larger compounds.
Once researchers understand these essential functions, they can begin to "move molecules and encourage them to act in different ways—take light and convert it to energy at the molecular scale," according to Martinez. If they can do that, they can reproduce photosynthesis or dramatically improve solar cells.
"We want to change the paradigm from simulating chemical reactions to designing them," Martinez says. That will require simulations that run 1,000 times faster than today's.
"How do you get that speed up?" he asks. "Novel computational architectures."
Something so novel
A two-minute preview video of Todd Martinez's talk, "Challenges in the Simulation of Chemical Dynamics."
View the full-length Flash video
View the full-length Quicktime .mov (~363 MB)
Using those novel architectures is often challenging. The graphics processing units (GPUs) in game consoles like PlayStations can be a powerful tool, for example. But working with them, especially in their native state, "is fun for about a week, then it just turns into a lot of work," Martinez says. Fortunately, novel architectures are the specialty of a growing group of researchers at NCSA and the University of Illinois.
In early 2007, the Martinez team began working with Volodymyr Kindratenko and others in NCSA's Innovative Systems Laboratory, which evaluates new computing systems that are likely to significantly decrease the cost of computational simulation or to significantly increase its power.
Students in the Martinez group also spent part of that year in a course led by Wen-mei Hwu, an electrical and computer engineering professor at Illinois. Hwu is a co-principal investigator on the Blue Waters project. Blue Waters—led by NCSA in collaboration with Illinois, IBM, and partners around the country—is expected to be the first sustained-petascale computing system for open scientific research when it comes online in 2011.
The course focused on what was then a newly released software development kit for GPUs by NVIDIA, a leading GPU manufacturer. The kit allows scientists to run codes like those used by computational chemists on hardware that has traditionally been used exclusively to generate images of video game race cars and space fighters. With the skills from that course and the expertise of NCSA's Innovative Systems Lab, the team began running small pieces of code on a variety of novel architectures. To date, they've ported this code to GPUs, to FPGAs (field-programmable gate arrays that allow the basic logical blocks on the chips to be set and reset), and to the Cell processor (which the Sony PlayStation®3 is based on).
Work on the GPUs has been published by Martinez and graduate student Ivan Ufimtsev in February 2008's Journal of Chemical Theory and Computation. Work on a cluster of Cell processors at NCSA was covered in New Scientist the same month.
The results are preliminary, but impressive. A small "toy" simulation of 64 hydrogen atoms interacting with one another runs 100 to 200 times faster on GPUs than on conventional supercomputing platforms. A more complicated strand of DNA 256 atoms in size was calculated 80 times faster. "On real calculations instead of toys, the factors are already always better than 25 times," Martinez says.
NCSA's Kindratenko explains, "The idea is to understand what sort of performance one is to expect from various accelerator technologies and what sort of problems map well onto these accelerators."
The Martinez team and NCSA are now selecting additional codes to port to the architectures. They are also considering moving some codes to NCSA's new GPU cluster, which will allow them to explore methods of running the codes across multiple GPUs at the same time.
Keeping afloat
Work on innovative systems creates the flood of data. But after the flood, what? The Martinez team is working on two Web-based systems to keep researchers afloat, SimDB and AnSim.
SimDB is a combination of third-party and in-house tools that automatically archives and tracks simulations created on traditional computers and novel architectures. As part of the Faculty Fellows Program, the team is looking to leverage aspects of NCSA's GridChem, an application focused more on generating quantum chemistry simulations using the grid. They hope to integrate some of the data parsers used within GridChem, as well as some of the standard data formats.
AnSim, which is in its earliest phases, will be designed to sort simulations that have inputs and results that are similar to one another. That will make the data easier to navigate, exposing connections among work done by different people or different teams. Ultimately, the team hopes it will also allow for self-organizing simulations. Data will not only be archived automatically, it will be assessed automatically, and new, better targeted simulations will be spawned.
"Single students [within an academic research group] are generating more data than they can keep track of now. We wouldn't have dreamed of that 20 years ago…but the computations have gotten that big," Martinez says. Personal, unique methods of organizing the data are a hindrance, and the sheer size and number of simulations "makes it hard to see connections among projects. What we want is to reflect the chemical intuition that people bring."