Contact: Mr. Gordon Anderson
Infrastructure for Data Management, Processing, and Analysis
PRISM (Proteomics Research Information System and Management) manages what is most likely the largest quantity (~150 Terabytes) of raw and processed proteomic data that currently exists in a single facility.
Developed to provide a flexible foundation for effectively handling the flood of data generated from high-throughput analyses, PRISM has evolved over the years to become more robust, as well as to address the needs arising from data dissemination and applications. Importantly, input from application efforts has been used to guide development of tools and approaches that better facilitate data analysis.
PRISM architecture block diagram. Enlarged View
Currently, PRISM stores, tracks, and provides increasingly automated analyses of proteomics data. The system is designed with an open architecture that includes two major subsystems:
- A Data Management System (DMS) that extracts data from the mass spectrometer data acquisitions systems, archives the raw data, and performs the first level of data analysis
- A Mass and Time tag System (MTS) that tracks AMT tags and identified peptides in LC-MS datasets.
PRISM's modular implementation facilitates scaling to increase capacity, which makes it readily adaptable to new software tools and algorithms and to interfacing with advanced analysis and visualization tools. Because of this modular design, MTS can rapidly integrate new open-source LC-MS/MS data analysis tools.
DMS includes functionality for organizing and scheduling the large number of samples processed, including
- The ability to track sample preparation progress
- The sample queue for LC-MS(/MS) analysis
- The specific LC separation column used for a given LC-MS(/MS) analysis, aiding in downstream sample processing and normalization.
We are making continuous improvements in the interaction between DMS and its users and administrators to enable facility staff to support increasing levels of throughput. The ability to efficiently load tracking information into DMS from spreadsheets has also streamlined the data entry process, especially for large-scale experiments that involve large numbers of samples.
Tutorials and explanations of the various features in DMS are now available in the PrismWiki website (within PNNL), which is a support collection of webpages and documents that describe the high-throughput proteomic process, software tools, and staff at PNNL.
Lastly, data files tracked by DMS and stored in EMSL's NWfs archive are now readily available to any researcher onsite using the SAMBA networking protocol. This enhancement reduces active DMS-specific server disk space from primary active storage to a storage buffer by leveraging the space available in the archive.