EARTH SCIENCES
DataMiningGrid tools and services release is now available
- Written by: Writer
- Category: EARTH SCIENCES
DataMiningGrid tools and services version 1.0 beta have been released, announced the DataMiningGrid Consortium on March 3. The partners in this effort were the University of Ulster (United Kingdom), University of Ljubljana (Slovenia), the Technion (Israel), Fraunhofer Institute (Germany) and automobile company DaimlerChrysler. The DataMiningGrid software is now available in source and binary under an open source Apache License v2.0. It is a general purpose software facilitating user - friendly, Grid-based data mining, which can only be used in connection to Globus toolkit version 4, Condor (optional) local scheduler, and Triana Workflow Editor and Manager. Detailed information can be found at www.datamininggrid.org. The real power of Grid computing lies in sharing resources across a network. These can be CPU cycles, storage, peripherals, network bandwidth, data and software. Ultimately, this will lead to the grand goal envisioned by Grid researchers in which grid users will be able to seamlessly access and harness geographically widely distributed computing resources as if they were using a local system. However, "trust, security, data privacy and reliability [or quality of service] in Grid computing is still a largely unresolved problem," says Dr. Werner Dubitzky, professor of bioinformatics at the School of Biomedical Sciences at the University of Ulster, and co-coordinator of the EU-IST-funded DataMiningGrid project. "These issues are particularly important when commercial computing jobs are distributed across sites not belonging to the company that issued the jobs." The DataMiningGrid Consortium investigated many of these problems for the specific field of, predictably, data mining. This is important for two reasons. Data mining is a technology that has been developed to analyze and interpret large quantities of data. It is one of the most powerful technologies used in engineering, astronomy, finance and biological sciences. "One of the key objectives of the project", said Vlado Stankovski, DataMiningGrid technical manager, "was to build technologies that facilitate the Grid-enabling of data mining technology, ranging from data pre-processing, analysis and post-processing techniques, even if these intrinsically reside in widely dispersed locations. It is hoped that this technology will eventually help to improve the effectiveness and performance of data mining applications and provide a much wider access to data mining technology." By using a series of mature or near mature tools to manage issues like scheduling, workflow management, and data access and integration, DataMiningGrid does not reinvent the wheel and focuses on the core problem: extracting relevant information from vast data sets across a grid. The project faced several critical challenges. First, the requirements for data mining applications varied widely across different domains and sectors. To bring them all under a unified systems architecture was difficult. Second, in many data mining problems, the data must remain at its source because of the volume of data, for privacy or other reasons. In this case, analysis must be executed close to where the data resides. In addition to this, one logical data set may be physically distributed across different locations. These requirements and constraints posed significant challenges. "We are happy to have made a strategic decision to move to open source, and make available the results of DataMiningGrid project to the general public. This will foster competition in the area. We now accept new entities to join the existing (academic) test bed, which is based on Globus toolkit 4 and DataMiningGrid WSRF- compliant technology. We also welcome researchers and developers to join the DataMiningGrid community mailing list," said Stankovski. The DataMiningGrid software is an important effort in realizing the true potential of Grid computing.