Analysis of big data that can reveal early signs of an Ebola outbreak or the first traces of a cyberattack require a different kind of processor than has been developed for large-scale scientific studies. Since the data might come from disparate sources — say, medical records and GPS locations in the case of Ebola — they are organized in such a way that conventional computer processors handle them inefficiently.
Now, the military research organization DARPA has announced a new effort to build a processor for this kind of data — and the software to run on it. A group of computer scientists at the Department of Energy's Pacific Northwest National Laboratory will receive $7 million over five years to create a software development kit for big data analysis.
"Our software development kit will support a high-level, easy-to-use programming environment for both average and expert programmers," said computer scientist John Feo at PNNL. "We also expect it to achieve the program's goal of one thousand-fold improvement over current technology in data processing efficiency."
Conventional processors work best with structured data such as that found in science or an online store, with items arranged in tables of price, descriptions and other categories. But for applications such as cybersecurity, tracking disease outbreaks, or analyzing the power grid, data comes from a variety of sources: emails, webpages or social media apps in the case of cybersecurity or generating stations, transformers, and homes with the power grid.
This type of data — unstructured — are splayed out in nodes linked by edges, like stars in constellations. In this arrangement, the relationships among nodes — the computers in a network or power plants on the grid — are represented by the edges — the Wi-Fi links between computers or the power lines on the grid. The nodes and edges form an image called a graph, which the new hardware and software will be designed to process and analyze.
Andrew Lumsdaine and John Feo will lead a team of researchers from the Northwest Institute for Advanced Computing and PNNL's High Performance Computing group on the HAGGLE project — Hybrid Attributed Generic Graph Library Environment. Read more about the HIVE program — Hierarchical Identify Verify & Exploit — at DARPA.