October 1, 2010
Journal Article

An efficient data format for mass spectrometry based proteomics

Abstract

The diverse range of mass spectrometry (MS) instrumentation along with corresponding proprietary and non-proprietary data formats has generated a proteomics community driven call for a standardized format to facilitate management, processing, storing, visualization, and exchange of both experimental and processed data. To date, significant efforts have been extended towards standardizing XML-based formats for mass spectrometry data representation, despite the recognized inefficiencies associated with storing large numeric datasets in XML. The proteomics community has periodically entertained alternate strategies for data exchange, e.g., using a common application programming interface or a database-derived format. However these efforts have yet to garner significant attention, mostly because they haven’t illustrated significant performance benefits over existing standards, but also due to issues such as extensibility to multi-dimensional separation systems, robustness of operation, and incomplete or mismatched vocabulary. Here, we describe a format based on standard database principles that offers multiple benefits over existing formats in terms of storage size, ease of processing, data retrieval times and extensibility to accommodate multi-dimensional separation systems.

Revised: December 16, 2010 | Published: October 1, 2010

Citation

Shah A.R., J.L. Davidson, M.E. Monroe, A.M. Mayampurath, W.F. Danielson, Y. Shi, and A.C. Robinson, et al. 2010. An efficient data format for mass spectrometry based proteomics. Journal of the American Society for Mass Spectrometry 21, no. 10:1784-1788. PNNL-SA-69363. doi:10.1016/j.jasms.2010.06.014