September 19, 2024
Report

DOE BSSD Performance Management Metrics Report Q3

Abstract

Microbiome data is complex, spanning information from microbial genomes within diverse communities, protein and metabolite readouts, and contextual information (metadata) captured from the environments from which these samples were collected. While the variety and scale of microbiome data generation has dramatically expanded over the past twenty years, infrastructure to support data management, sharing, and access has lagged. New ways to improve interoperability across existing resources and advancing community standards are necessary to support how researchers create, use, and reuse data. The National Microbiome Data Collaborative (NMDC) aims to advance a microbiome data sharing network through infrastructure, data standards, and community building. The NMDC leverages a federated data model with multi-omics microbiome data and metadata hosted across various locations, and is centered around the NMDC schema to ensure microbiome data are findable, accessible, interoperable, and reusable (FAIR). Our schema serves as a unified data model that weaves together existing community standards and ontologies along with the use of persistent identifiers (PIDs) to provide globally unique, persistent, and machine resolvable identifiers to connect data objects created within the NMDC infrastructure (e.g., studies, samples, and workflow runs). All NMDC data and metadata can be accessed through a user-friendly Data Portal and programmatically through a public Application Programming Interface (API). The NMDC API can be used broadly by the research community to query and access biosample and workflow outputs, and we have provided tutorials for researchers to learn how to use the NMDC API. NMDC’s overall software architecture thus supports the programmatic exchange and linking of data across DOE’s Biological and Environmental Research (BER) program User Facilities, the Joint Genome Institute (JGI) and Environmental Molecular Sciences Laboratory (EMSL). Our architecture also supports linking of data with BER resources, the Environmental System Science Data Infrastructure for a Virtual Ecosystem (ESS-DIVE) and DOE’s Systems Biology Knowledgebase (KBase), towards a larger FAIR data ecosystem. The NMDC serves as a data integration “hub” for standardized microbiome data for DOE’s BER program and beyond. Microbiome data generated at EMSL and JGI are available through the NMDC in collaboration with the primary research teams. The Submission Portal supports both legacy and prospective studies to adhere to community standards and comply with community best practices. The NMDC also links to ESS-DIVE for archived environmental data and for future metabolic modeling efforts in KBase. Herein, we describe NMDC’s strategy to engage additional data and modeling resources to maximize microbiome data accessibility and interoperability.

Published: September 19, 2024

Citation

Eloe-Fadrosh E., P. Chain, S. Cholia, K. Fagnan, D.M. Mans, L. McCue, and C.J. Mungall, et al. 2024. DOE BSSD Performance Management Metrics Report Q3 Richland, WA: Pacific Northwest National Laboratory.

Research topics