SCIENCE
'Semantic Supercomputing' Quickly Scours the World's Patent Databases
Vienna-based Matrixware Information Services is using Interactive Supercomputing, Inc.'s (ISC) Star-P software to tackle the ever-growing challenge of finding patent information hidden in the world's vast patent databases and libraries. The Austrian company employs a team of computer engineers, mathematicians, linguists and patent specialists to help companies mine patent repositories for intellectual property information. It combines natural language processing (NLP) algorithms with what it calls semantic supercomputing to retrieve relevant patent information faster, easier and at less cost.
Patents and intellectual property play an increasingly important role as intangible assets of industrial corporations. Some 60 million patents have been awarded around the world, and the yearly number of new filings is on the rise. Over 250,000 companies worldwide depend on patent data. Consequently, professional management of patents and precise retrieval of patent information are essential business processes for industries around the globe.
To solve this problem, Matrixware employs multi-core high performance computers from SGI and Star-P's interactive parallel computing capabilities to develop and run its NLP algorithms on the enormous, terabyte-scale patent data sets. Star-P enables Matrixware's team to continuously code and refine NLP algorithms on their desktops using MATLAB, a popular mathematical tool, and then run them instantly and interactively on parallel computers with little to no modification. Star-P eliminates the need to re-program applications in C, Fortran or MPI in order to run on parallel systems, resulting in huge productivity gains.
Matrixware's Alexandria System is the central storage for the raw data as well as for the enriched data. Data access is modeled along the well established Library Science methods and embedded into a workflow system. The Alexandria server also provides the user with exact and constantly updated document counts in the collections he is retrieving from.
Recursively generating metadata from data and metadata from metadata, the various refinement processes let the information store grow and allow the user community to actively "Cultivate the Corpus." As a development and front end framework, Matrixware created an extensible software infrastructure, the "Leonardo" Ecosystem. Within this framework, technologists can simultaneously create and refine new tools and use the community channel to communicate with their end-users. The benefit for the end-users on the other side is a closer match between the tools for their actual information needs and existing workflows.
"Matrixware processes patent data by its meaning in context to turn it into valuable information for our clients. Our purpose is to boost their productivity and open up new opportunities for them using intellectual property information," says Francisco Webber, Matrixware's managing director. "But while our scientists are experts in information retrieval, they are not parallel programming experts. Star-P enables them to tap the power of parallel HPCs to refine and run their natural language processing applications as well as to improve the data quality of our patent databases."
"Despite the massive growth of patent information over the last several decades, researchers still search the way they did 30 years ago," says David Gibson, ISC's vice president of business development. "Matrixware's NLP technology and semantic supercomputing breakthroughs are turning patent information retrieval into a huge competitive advantage for companies whose success hinges on intellectual property discovery and protection."