BIG DATA
LSU researchers awarded nearly $1 million for big data research
- Written by: Tyler O'Neal, Staff Editor
- Category: BIG DATA
LSU's Seung-Jong Park, associate professor of computer science with joint appointment in the Center for Computation & Technology, or CCT, along with co-investigators Joel Tohline, Sean Robbins, Lonnie Leger, K. Gus Kousoulas and other senior LSU faculty, recently received an NSF grant of $947,860 for a campus-wide project aimed at bringing "Big Data" computational capabilities to separate university research groups. Samsung Electronics is also participating in the project as an industrial collaborator.
The project, titled "CC-NIE Integration: Bridging, Transferring and Analyzing Big Data over 10Gbps Campus-Wide Software Defined Networks," will empower scientific breakthroughs at LSU by providing researchers with advanced information technologies and cyberinfrastructure.
"Big Data is a very hot term right now," Park said. "Genome sequencing is one of the major drivers for Big Data research. It is not unusual to produce many terabytes of data in sequencing the human genome, or trillions of digital information bytes. But processing terabytes of data has been a headache for researchers using their own equipment."
Genome sequencing, which involves determining the exact sequence of an organism's hereditary molecule known as DNA, has many applications in biological and medical research, including personalized medicine. However, genome sequencing and comparisons of genomes across organisms require large amounts of data processing. The human genome, for example, contains three billion molecular units, like three billion beads on a string arranged in a specific order. Assembling this amount of data, or even assembling shorter genome sequences like those of the West Nile or AIDS virus, for example, will require massive computational power and data storage capabilities.
"But that kind of problem can be solved with our solutions," Park said. "Louisiana has a huge amount of computational power. We have supercomputers at LSU. We recently purchased Supermike-II, which might become a super-class supercomputer. With this project, we can help researchers bring their data to bigger machines, not just one or two computers."
Kousoulas has been particularly interested in Big Data hardware and software infrastructure for biomedical applications.
"The advent of human genome sequencing is poised to revolutionize medicine in the future, since specific risks associated with changes in the human genome can be accurately predicted ahead of any disease symptoms allowing physicians to deliver protective and preventative therapies," he said.
Unfortunately, until now, Big Data researchers at LSU have not been able to take advantage of LSU's supercomputers. Performing research involving genome sequencing, drug discovery, modeling and prediction of coastal hazards has traditionally required LSU researchers to devote large amounts of time and money to acquire their own high-throughput computational equipment.
"Right now, with current technology, LSU's supercomputers are not adapted for Big Data," Park said.
But that is set to change. Park and colleagues are building a high-speed intra-campus network that will connect separate lab groups on campus to LSU's primary supercomputer facility.
Samsung is collaborating with LSU on the high-speed network-building phase of the project, helping to establish high-speed networks and large memory storage units on campus in order to handle the massive amounts of data generated by Big Data applications. Samsung has donated 70 terabytes of solid state disk storage to LSU for this project.
"With this network, after researchers produce their data, they could send it over our 10 gigabyte-per-second network to LSU supercomputing clusters," Park said.
The concept is similar to that of cloud computing, where instead of needing their own high-performance computers in the future, LSU researchers will be able to send all of their scientific data over a network to be processed and analyzed automatically by a large number of connected computers across campus.
"The reviewers of this proposal at the national level understand that LSU houses the unusual combination of talents that is required to make complex and innovative projects of this type successful," said Joel Tohline, CCT director. "LSU not only has a diverse group of scientists and engineers that can take advantage of this new capability, but also highly skilled technical staff who can implement the new cyperinfrastructure."
But hardware is not the only challenge in adapting LSU supercomputers to Big Data research. Using funds from the NSF grant, Park and colleagues will also be developing unique software that researchers will be able to use to harness supercomputer computational power for data-intensive applications. For example, the software will need to be modified to run coastal models created by researchers in the School of the Coast & Environment and in the College of Engineering for hurricane and sea level rise predictions.
"Without software, it is difficult for most Big Data research groups to utilize supercomputers for their research applications," Park said. "With our NSF grant, we are enabling this kind of research at LSU. We will be able to transport data from right in front of LSU researchers' laboratories to supercomputing facilities on campus. This grant is finally connecting existing infrastructures together to enable Big Data research that is not currently possible."
And this is just the beginning. Park sees the potential for extending this "Big Data network" to Louisiana as a whole, bringing LSU's supercomputing cluster capabilities to research projects at Pennington Biomedical Research Center, Tulane and other institutions interested in Big Data.
"The cyberinfrastructure developed at LSU with this NSF funding can serve as a model for facilitating and promoting biomedical and other research collaborations among all LSU campuses and other institutions in Louisiana in the future," Kousoulas said.