ACADEMIA
Xilinx Helps University of Regensburg Launch the World's Most Power-Efficient Supercomputer
High-performance Virtex FPGAs provide critical network traffic management in a bespoke supercomputer that is ranked top of the Green 500 supercomputer list
Xilinx is applauded for its role in developing QPACE; a bespoke supercomputer developed to unlock the mysteries of Quantum Chromodynamics. The predominant simulation process used to model Quantum Chromodynamics is known as Lattice QCD and is only possible using high-powered, highly parallel supercomputers. Xilinx Virtex-5 LX110T FPGAs were selected to provide core networking technology in QPACE, a two-year project that required leading-edge performance from commodity components.
Using a custom-design approach has made QPACE one of the most power efficient supercomputers ever developed, with a peak performance in single/double precision of 26/56TFlops and an average power consumption of 29kW per rack. This puts QPACE at the top of the Green 500 list; a league table for the world's most energy efficient supercomputers.
Prof. Tilo Wettig, Ph.D., of the University of Regensburg, Germany and Principal Investigator of the QPACE project, commented: "We have shown, for the first time, that FPGAs can be used as network coprocessors to interconnect very fast processors in a massively parallel supercomputer, scalable to thousands of nodes."
The QPACE supercomputer uses node cards, comprising a PowerXCell 8 which is an enhanced version of the Cell Broadband Engine Architecture developed by Sony, Toshiba and IBM and first seen in the PlayStation3, integrated alongside 8 Synergistic Processing Elements and a Power Processing Element.
The maximum number of node cards in a rack is 256 and a typical system comprises four racks, or 1024 node cards connected through three types of networks. The QPACE team used Xilinx Virtex-5 LX110T FPGAs to implement the network processors (NWP), which interface between the processing elements and the interconnection networks, managing all network traffic for optimal performance.
"QPACE represents over two years of effort from a number of leading technology providers," commented Patrick Lysaght, senior director, Xilinx Research Labs. "Xilinx FPGAs are well-known for providing key network processing capabilities in modern wired and wireless communications networks. This research is especially exciting because it demonstrates how low-latency, high-throughput connections can be achieved by using Virtex-5 devices to realize core networking functions in state-of-the-art, power-efficient supercomputers."
Low Latency
Lattice QCD algorithms typically use relatively small messages, so network latency has a major impact on the efficiency. Custom network technologies, while delivering enough bandwidth, often introduce unacceptable latencies, in the region of 10 microseconds. Using FPGA technology the team achieved a cell-to-cell latency of just 3 microseconds, with a latency as low as 0.5 microseconds achieved for the optimised design of the 3-dimensional Torus network implemented in the Virtex-5 FPGA.
RocketIO transceivers on the Virtex-5 FPGA were also instrumental in implementing the Rambus FlexIO interface; described as a major accomplishment as it is an extremely complicated interface to implement in FPGA technology. IBM, in addition to being one of the major partners in the QPACE project, made significant engineering contributions including providing the Rambus FlexIO interface.
In addition to drastically reducing development time and risk, Prof. Wettig cited the Virtex-5 FPGA's GTP transceivers as being of critical benefit. The Torus network is able to sustain 2.5GHz without bit errors on hundreds of node cards, with single node cards successfully tested at 3GHz. Prof. Wettig concluded by adding: "Xilinx FPGAs were essential in the success of the QPACE project."