WHONDRS ICON-FAIR Framework
WHONDRS is built to be Integrated, Coordinated, Open, and Networked. Open includes aligning with FAIR data principles so that WHONDRS data are Findable, Accessible, Interoperable, and Reusable. Together, these ICON-FAIR principles and approach are a realization of the U.S. Department of Energy (DOE) Biological and Environmental Research (BER) vision for open watershed science by design, which is laid out in a 2019 DOE workshop report. WHONDRS embodies the vision of open watershed science by design and is described in the workshop report as a use case.
What are ICON-FAIR principles, and how does WHONDRS embody them?
- Integrated processes across traditional disciplines (i.e., physical, chemical, and biological) and scales
- WHONDRS emphasizes connections among microbial community composition and function (biology), major ions and detailed properties of organic matter (chemistry), and surface and subsurface hydrology (physical)
- WHONDRS has targeted a range of temporal scales, including short-interval times-series (sampling every 3 hours during the 48 Hour Diel Cycling Study), constrained periods (6 weeks for the Summer 2019 Sampling Campaign; 8 weeks for Perturbation Response Traits, and open-ended sampling over multiple years (Global Metabolite Bio-chemo-geography)
- The spatial scale of studies range from sampling across the contiguous United States to globally. Some studies collect samples at a single location within a site (Global Metabolite Bio-Chemo-geography). Others collect from three nearby locations for certain analyses (48 Hour Diel Cycling Study; Summer 2019 Sampling Campaign) or from up to 10 locations along a transect (Perturbation Response Traits).
- Coordinated use of consistent protocols across systems to generate data types needed to inform, develop, and improve data, knowledge, and/or models
- WHONDRS creates free, easy-to-use sampling kits with standardized materials and ships them to collaborators to enable consistent sampling methods. The kits include detailed protocols, which are visually represented in accompanying video protocols openly accessible on YouTube.
- WHONDRS provides many opportunities over email, one-on-one meetings, and videoconferences for collaborators to ask questions about the protocols, suggest improvements, and assure they understand the sampling requirements.
- Field methods are kept consistent across different studies to the degree it is appropriate. For example, all filtered water samples from WHONDRS use the same filter type (0.22 micron Sterivex), and all FTICR-MS water samples use the same preservation method (2 uL 85% phosphoric acid).
- Laboratory analyses are done using consistent methods both within and across WHONDRS studies. For example, WHONDRS worked closely with EMSL to develop standardized procedures for preparing and analyzing water samples via high-resolution mass spectrometry. These procedures span the sample life cycle, including storage, preparation, instrument settings, and data processing.
- Data processing for all WHONDRS data types are currently or will soon be automated to provide transparency, reproducibility, and allow for faster publication.
- Open exchange of data, software, and models throughout the research life cycle that are FAIR such that all are enabled to contribute and leverage resources.
- WHONDRS shares study concepts early in the design process via discussion-based collaborator videoconferences that provide an opportunity for feedback and ideas. These have allowed WHONDRS to adapt study designs based on community input and led to protocol modifications, introduction of new data types, and, during S19S, the formation of a new type of involvement with WHONDRS in which collaborators volunteered to analyze WHONDRS-provided sample aliquots.
- All published data aside from metagenomics and metatranscriptomics are openly available on ESS-DIVE. Data are findable through a built-in search function within ESS-DIVE that is paired with a digital object identifier (DOI) for each data set. The ESS-DIVE search allows discovery of the full data sets, and the underlying data are accessible through an open-access license (CC0).
- All metagenomics and metatranscriptomics data are openly available via the Joint Genome Institute and KBase. They will be available via the National Center for Biotechnology Information and will likely eventually be available via the National Microbiome Data Collaborative.
- WHONDRS data sets are machine readable and include detailed field, laboratory, instrument, and data-processing metadata to maximize reusability of data.
- Data are published as soon as QA/QC is completed. Preliminary data are also posted on a Google Drive. WHONDRS does not hold back data for our own use before releasing data to the public.
- WHONDRS data are consistently structured and use community standards. Column headers for geochemistry include United States Geological Survey codes to aid with interoperability. ESS-DIVE is currently developing standards for data uploaded to the site, and WHONDRS will conform to those once they are finalized. For data types that currently lack standards, such as mass spectrometry data, WHONDRS uses a consistent format and is engaging with the community to develop standards.
- WHONDRS is in the midst of incorporating several new components of being open throughout the research life cycle, including pre-registration of studies, assuring all field-collected samples have International Geo Sample Numbers for unique identification, and pursuing crowdsourced analyses, interpretation, and publishing. WHONDRS has also developed a graphical user interface (GUI), which is published in one package (Stegen et al. 2018) and will be expanded to others. The GUI, which is described in Lin et al. 2020, allows for searching within a data set using search criteria (e.g., spatial bounding box and data types of interest) and, following sample selection, provides consistently formatted, machine-readable output that includes all data types in one ready-for-analysis package.
- Networked efforts, whereby data generation and/or sample collection are done with and for the scientific community such that the work is mutually beneficial and provides resources (e.g., data and sensors) to contributors that otherwise would be difficult or impossible for them to access.
- WHONDRS studies are informed and carried out by our community of collaborators across the world via free sampling kits. The networked approach allows WHONDRS to perform global studies with minimal infrastructure and provides collaborators with access to data types they would not otherwise have (e.g., FTICR-MS) for their local systems and the opportunity to contribute to a global effort.
- All data are available to meet the needs of individual researchers that may be interested in using the data in publications, in proposals, or for other uses. WHONDRS does not restrict data use and actively seeks out ways to help the community use the data to meet researcher needs. For example, a team of WHONDRS collaborators has added additional data types and combined their data with WHONDRS data to generate and submit a publication. Such efforts are highly encouraged, are mutually beneficial, and advance science through collective action.
- Existing research infrastructure has helped WHONDRS collect samples in a distributed way across the United States through Critical Zone Observatories, Long-Term Ecological Research sites, U.S. Forest Service Experimental Forests and Watersheds, DOE BER watershed testbeds associated with national laboratories, and the National Ecological Observatory Network.