Good intentions, though, ran into typical research-sharing snags: emailed files galore, unwieldy huge files, little documentation, and no storage uniformity.
“A lot of duct tape and baling wire solutions to figure out how to actually share data,” said Burleyson, an Earth scientist in the Atmospheric Sciences and Global Change Division of Pacific Northwest National Laboratory (PNNL). “In the past, there was a lot of, ‘I got the data from Jim who got it from Bob who got it from Phil in 2009.’ That doesn’t inspire a lot of confidence for reusing the data in a peer-reviewed study.”
A better way was needed. Burleyson and collaborators started their efforts toward that goal in June 2018 with an exploratory conversation with DOE managers. Dozens of meetings and tests later, they are ready to roll out their solution, called MSD-LIVE. (The ‘LIVE’ in MSD-LIVE stands for Living, Intuitive, Value-adding, Environment.) The data management platform will be released on August 22 for the hundreds of scientists at national laboratories and universities who contribute to MSD, which explores the dynamics and co-evolutionary pathways of human and Earth systems with a focus on critical goods, services, and amenities delivered to people through interdependent sectors.
MSD-LIVE is a cloud-based data and code management system and advanced computing platform that enables researchers to document and archive data, run their models and analysis tools, and share data, software, and multi-model workflows.
“MSD-LIVE will provide a stable, reliable collaboration platform for the MSD community that's easy to use,” said Burleyson.
The team devoted more than two years to listening to and educating the MSD community about the possibilities of embracing the open-science movement. They sought to better understand the specific challenges MSD researchers faced with data collaboration and code management. That outreach, said Burleyson, helped the team tailor the capabilities of MSD-LIVE to the needs of the community.
“We’ve created an interface that’s intuitive and easy to use,” he said, “so that it makes it easier for MSD scientists to document their data and share it with their collaborators via the cloud. MSD-LIVE also makes the data citable—something that had been very difficult to accomplish with previous data sharing efforts.”
Burleyson and collaborators considered nearly a dozen open-source digital repository and data management frameworks before settling on a winner, the Invenio Research Data Management (InvenioRDM) system. They chose InvenioRDM based on its functionality, flexibility, robustness, documentation, support, and usability.
And they enlisted Carina Lansing, a PNNL software engineer, for guidance to mesh the laboratory’s enabling infrastructure with InvenioRDM’s dataset and software archival and publishing.
“All science has data challenges, but the MultiSector Dynamics community is especially data challenged because they link multiple models together and, as a result, have a ton of datasets that have to be moved around to be able to run those models,” said Lansing.
In addition to Burleyson and Lansing, the MSD-LIVE development team at PNNL includes Zoë Guillen, software engineer; Devin McAllester, software engineer; Matthew Macduff, software engineer; Elvis Offer, software engineer; and Anna Sabin, user experience engineer. Jon Weers, a data scientist with the National Renewable Energy Laboratory, also contributed.