October 2, 2021
Journal Article

A guide to using GitHub for developing and versioning data standards and reporting formats

Abstract

Data standardization combined with descriptive metadata is a key mechanism for facilitating data reuse, which is the ultimate goal of the Findable, Accessible, Interoperable and Reusable (FAIR) principles. Community-developed data or metadata standards and reporting formats are increasingly created through a bottom-up approach that emphasizes collaboration between various stakeholders. Such an approach requires platforms for collaboration on the standards development process that center on sharing information and receiving feedback. Our objective in this study was to conduct a systematic review to identify the organizations that use version control for developing data standards, summarize common practices, and outline guidance for groups that want to use version control in their own standard development. Of the 108 Earth and Environmental Science data standards and reporting formats identified in our review, 32 used GitHub as the version control platform for development. We found no universally accepted methodology for developing and publishing data standards. Moreover, key information was missing from many GitHub repositories (e.g., usage licenses, GitHub issue templates, citations for where documents are stored in a permanent repository) that limits the ability for developers to gather user feedback, create, or improve standards that build on previous work. We provide guidance for community-driven standard development and associated documentation based on a systematic review of existing practices.

Published: October 2, 2021

Citation

Crystal-Ornelas R., C. Varadharajan, B. Bond-Lamberty, K.E. Boye, M. Burrus, S. Cholia, and M. Crow, et al. 2021. A guide to using GitHub for developing and versioning data standards and reporting formats. Earth and Space Science 8, no. 8:e2021EA001797. PNNL-SA-160830. doi:10.1029/2021EA001797