SYSTEMS
Turbocharge your job submissions!
- Written by: Writer
- Category: SYSTEMS
Applications that run thousands of jobs can cause headaches. Huge numbers of job submissions to a site often cause bottlenecks, make system administrators grumpy, and worse, bring down remote gateway nodes, rendering the resources useless and losing jobs in the process. Traditional techniques commonly used in the scientific community do not scale to today’s — let alone tomorrow’s — largest grids and supercomputers. But the new class of applications called Many Task Computing, discussed in the recent article “” has spawned development of a new framework, called Falkon, that enables applications to scale up quite painlessly and use these large systems efficiently.
Minutes to milliseconds
(Fast And Light-weight tasK executiON) is designed to help restructure applications to reduce job wait time, network bandwidth and job submission overheads from minutes to milliseconds. It leaves many of the higher overhead features such as accounting and persistency, for the local resource managers or the applications to handle. Falkon focuses on efficient handling of many independent tasks on large-scale distributed systems with many processors.
Falkon has demonstrated vast improvements in performance and scalability for a wide variety of tasks — tasks with execution times ranging from milliseconds to hours, compute- and data-intensive tasks, and tasks with varying arrival rates. The improvements extend across diverse applications from astronomy to medicine, economic modeling and beyond, and to scales of billions of tasks on hundreds of thousands of processors.
One researcher who adopted Falkon is Andrew Binkowski at the at . Binkowski and his team model three-dimensional protein structures in their basic research towards drug design. Since proteins with similar structures tend to behave in similar ways, the team compares the modeled structures to existing, known proteins in order to predict their functions -- a computationally intensive task.
“As the (a repository of known proteins) expands almost exponentially, it becomes more difficult to coax desktop machines to do the types of analysis required,” says Binkowski. “We turned to Falkon as a way to utilize our existing software applications.”
What makes Falkon fly faster
The Falkon framework uses three novel techniques to enable rapid and efficient job execution and to improve application performance and scalability. Multi-level scheduling, in which resource allocation for a job is separated from job dispatch, enables on-the-fly resource allocation and minimizes the wait queue times. Secondly, Falkon’s distributed streamlined task dispatcher achieves from ten to a thousand times the dispatch rates that conventional centralized schedulers do. Third, Falkon’s data-aware scheduler can coordinate tasks and data so that the data transfer is minimized from shared or parallel file systems and across the network.
We can ask bigger questions
"Falkon has allowed us to ask bigger questions and perform experiments on a scale never before attempted — or even thought possible,” says Binkowski. “This is the difference between comparing a newly determined protein structure to a family of related proteins versus comparing it to the entire protein universe.”
The team has done all of this using existing software packages that were not designed for high-throughput computing or many-task computing, and used Falkon to coordinate and drive the execution of many loosely-coupled computations that are treated as “black boxes” without any application-specific code modifications.
“Whereas identifying similarities in protein binding pockets (for protein structure analysis) is characterized by millions of discrete jobs taking seconds to complete, docking and scoring a small-molecular compound (for drug discovery) can require several hours to converge on a solution. In both cases, we are able to tailor our workflows to achieve the best possible scientific results and still get the throughput and efficiency we need to take advantage of the large computing resources we have available."
—Ioan Raicu and Ian Foster
Source: iSGTW http://www.isgtw.org/