Special Report: Computational Science — Behind Innovation and Discovery
Solving complex scientific problems requires not only advanced high-performance supercomputers but also innovative software programs that can discover patterns and integrate data across different space and time scales. Researchers at PNNL are creating innovative software and processes to do just that. Among them are:
Global Arrays Toolkit (GA Toolkit)—provides an efficient and portable "shared memory" programming interface for distributed-memory computers.
NWChem—a computational chemistry package that runs large chemistry problems efficiently and is used by thousands of people worldwide. NWChem was developed using the GA Toolkit and is designed to run on high-performance parallel supercomputers as well as conventional workstation clusters. It aims to be scalable in its ability to treat large problems efficiently and in its usage of available parallel computing resources.
ScalaBLAST—a program that processes genomic sequences in minutes rather than weeks. Genomes often contain millions of sequences, making them "data-intensive." ScalaBLAST was also developed using the GA Toolkit. ScalaBLAST scales well on both shared and distributed memory machines while scheduling queries across available process groups and sharing the target database across available memory. With ScalaBLAST, researchers can perform a whole proteome analysis on the human genome in 50 hours.
Fuel cell modeling—a process used to model fuel cell materials and fuel cell systems to better understand the fluid, thermal, electrochemical and structural response of fuel cells and determine how they would perform before they're actually built. Researchers have developed electrochemistry codes, for example, that allow modeling of the steady-state response of solid oxide fuel cell (SOFC) stacks. SOFCs offer attractive benefits for energy production because of their high-power density and fuel flexibility. Electrochemistry modeling is key to predicting the behavior of the cell (including performance, fuel use, thermal and flow characteristics).
PNNL has formed a research alliance aimed at enabling a new generation of fast and efficient storage technologies for data-intensive computing. Part of a long-term collaboration between PNNL and Silicon Graphics, the alliance includes options for more than 2.5 petabytes (a petabyte equals one million-billion bytes, enough to hold seven copies of the Library of Congress) of storage over the next two years. PNNL will conduct research into "active storage," a groundbreaking effort to shift computation and transformation of data from client computers to storage devices. The effort holds the promise of dramatic productivity breakthroughs for a broad range of computing disciplines saddled by large data.
Recently, the team developed an innovative next-generation archive system that may help to accelerate scientific research in proteomics. The archive system is designed to store the large quantities of data generated by high-throughput proteomics research at PNNL and EMSL—particularly output from the facility's cutting-edge mass spectrometry instruments—that threatens to overwhelm current storage technologies.
The archive system combines storage and a Lustre™ file system—the latter designed to serve very large clusters—with the PNNL-developed Active Storage, a technology that exploits underused computational resources of Lustre™ servers where contents of files are stored.