April 5, 2024

New, Publicly Available Collection of Data Illuminates Molecular Host Responses to Lethal Human Viruses

Data reveals mechanisms in our bodies that can help us fight viral pathogenesis

Multiple viruses and bacteria in a colorful illustration

Data reveals mechanisms in our bodies that can help us fight viral pathogenesis

(Corona Borealis Studio | Shutterstock)

Scientists have studied deadly viruses for centuries. Their origin, proliferation, and evolution leave clues about how these pathogens survive, as well as how we can hold them at bay. But only studying the viruses alone won’t answer all our questions about how they cause disease.

Viruses need living cells to survive—and thrive—making humans an ideal home. Yet we’ve seen time and again how symptoms and survival rates differ from person to person, even when infected with the same virus. The question is, “how does the virus outsmart our body’s response to attack?”

In 2013, the National Institutes of Health’s National Institute of Allergy and Infectious Disease funded the OMICS Technologies for Predictive Modeling of Infectious Diseases—called OMICS of Lethal Human Viruses (LHV)—through the Systems Biology Centers for Infectious Diseases  OMICS of Lethal Human Viruses (LHV) Systems Biology Center to get us closer to an answer. The goal was to use global data of how a host—a living thing—responds to viral infections and then model those responses to infections to identify host-dependent mechanisms that regulate severe or fatal disease. In short, they wanted to identify host targets that could be used to design new drugs or therapeutics and reduce viral disease.

Sharing the wealth

Over the next six years, a consortium of scientists under the OMICS-LHV project conducted 45 comprehensive infection experiments on four lethal virus species: Influenza A, Ebola virus, West Nile virus, and Middle East Respiratory Syndrome coronavirus (MERS-COV). Using human and mouse host model systems, they collected samples over a course of time, rather than just once, to understand changes in the host during viral infection. Then they conducted multi-omics analyses of the samples, resulting in > 24,000 raw genomics, proteomics, metabolomics, and lipidomics datasets to characterize host response at a molecular level. The group published more than 20 papers and formed individual stories from those analyses—specifics about virus-host relationships they discovered.

The culminating paper, published this month in Scientific Data, covers a new data collection comprised of statistically processed datasets derived from raw datasets, which were produced over the entire project and linked to experimental data about the phenotype, or severity of disease. Scientists took three years writing it, partly because of the COVID-19 pandemic, but also because the group was busy cataloguing and curating these datasets into a single place, DataHub. What’s more, they did so in compliance with future data-sharing mandates outlined by FAIR guiding principles—findability, accessibility, interoperability, and reusability.

“We brought the data back from the grave and made it timeless,” says Lindsey Anderson, a computational chemical biologist at Pacific Northwest National Laboratory (PNNL) and co-first author on the paper.

Had the group left the data as-is in 2019 when the project ended, it wouldn’t have complied with future data-sharing policies nor been as useful to others in the scientific community. But by spending the time to revitalize the data with updated standards and revised metadata into new packages, they ensured compliance with the new 2023 Final NIH Policy for Data Management and Sharing, as well as broader, future mandates for open science data management that go into effect in 2025.

“The collection is extremely useful to anyone who wants to understand viral disease pathophysiology or network biology,” said Katrina Waters, a chief scientist and laboratory fellow at PNNL and a corresponding author on the paper. “Typically, we see separate datasets of genomics, metabolomics, and proteomics in their own public repositories, but this is the first phenotype-linked data collection that gives scientists a more complete picture of virus-host relationships.”

A portion of the raw datasets were leveraged in a 2021 research project for the Department of Energy’s National Virtual Biotechnology Laboratory. The resulting paper shows how hypergraph models of biological networks can help researchers identify important genes in viral infections.

This sharing of data among the scientific community and for various sponsors is precisely the goal of the paper’s authors in the spirit of open science.

“With the raw and statistically processed datasets now available, the data collection is accessible to those with and without a background in statistics,” said Amie J. Eisfeld, a scientist at the University of Wisconsin-Madison (UW-Madison) and a co-corresponding author on the paper. “A lot can still be discovered from this data.”

In all, 29 scientists from ten institutions contributed to the final data collection paper. Eisfeld led the project with Dr. Yoshi Kawaoka, also from UW-Madison. PNNL conducted the omics analyses at the Environmental Molecular Sciences Laboratory (EMSL), a national scientific user facility sponsored by the U.S. Department of Energy.

Waters says that since the COVID-19 pandemic, the scientific community has recognized the need to identify host-targeted interventions to emerging viruses. “Viruses change and evolve so rapidly. It’s not enough to wait for a virus to spread and then work backward to figure out how it became lethal after an outbreak occurs,” she said. “We need to use what we know about common host responses to virus infection to build therapeutics that reduce disease while we open more pathways to shutting down virus survival before it can spread.”

Published: April 5, 2024