Scientists are racing to keep pace with COVID-19, creating new tools to figure out how the novel coronavirus works.
For researchers at Pacific Northwest National Laboratory (PNNL), understanding viral infection is a matter of mathematics rather than a purely molecular analysis. They are using an advanced mathematical tool called hypergraphs to identify how human cells respond to viral infection, including the new coronavirus. The key proteins participating in that response might be targets for developing medicines to treat COVID-19.
PNNL mathematician Emilie Purvine and computational biologist Jason McDermott recently presented their work virtually at the Association for Computing Machinery’s SIGKDD (Special Interest Group on Knowledge Discovery and Data Mining), an annual conference for data mining, data science, and analytics.
Hypergraphs for viral infection
In a key step, the team tested the new approach with data from a similar virus, the coronavirus that causes Severe Acute Respiratory Syndrome, or SARS. That virus infected more than 8,000 people as it swept across the globe in 2003.
The PNNL team found that the results from the new method matched up with data previously collected about that virus. Using hypergraphs, the team identified and ranked the activity of several genes now known to be important to the activity of the virus that caused the SARS-1 outbreak.
“Our work independently identified the same genes known to be important with SARS activity. This was an important step to take before applying our work to the virus that causes COVID-19,” said McDermott.
Now the PNNL team is applying the new technology to the current virus, using hypergraphs to sort out and rank the importance of many of the hundreds of genes active in COVID-19.
Purvine and McDermott have been using hypergraphs to explore how human cells respond to viral infections for the past two years. They’ve worked with data gathered by PNNL biologist Katrina Waters, who has been tracking gene expression, protein expression, and molecular changes in human cells infected with viruses including influenza, Zika, Ebola, and coronaviruses for about a decade.
To apply hypergraphs to this large data set, the researchers first had to figure out how to identify groups of proteins in a way that set them up to build a meaningful hypergraph. The team was tackling that challenge earlier this year, at the same time the coronavirus pandemic hit.
From graphs to hypergraphs
The collaboration with Purvine offers a new tool to McDermott, who has been using graph-based mathematical techniques to analyze connections between genes, proteins, and signaling molecules in cells for years.
He and his colleagues identify relationships between two molecules at a time. Then they categorize connections between many separate interactions. Those connections quickly tangle into complex graphs representing molecular networks that keep cells functioning.
The researchers analyze the structure and shape of those graphs, looking for meaningful patterns that indicate molecular components with key roles. Centrality, or when one molecule has many connections to others, is one type of pattern.
The entire structure of a graph is another meaningful pattern. Some central connections act like bridges to keep information flowing between different parts of the network. Genes or proteins involved in these “betweenness” connections likely keep an entire cell functioning properly.
Hypergraphs represent a potential leap forward. Instead of representing connections between individual components, hypergraphs show relationships among groups of things. Since biological networks operate through molecular groups, scientists believe hypergraphs could represent their structure more realistically than standard graphs.
Scientists have used hypergraphs to represent social groups and computer network infrastructure, but their computational complexity makes them an uncommon technique for studying large-scale biological networks that arise from experimental data.
An open-source hypergraph software tool called HyperNetX, developed at PNNL, makes this analysis more accessible to researchers in various disciplines. But applying the technique to data from a variety of fields still requires some tinkering.
“Since there are so many ways to build hypergraphs from biological data, biologists probably need to involve a computational mathematician to do this, for now,” Purvine said.
PNNL data scientist Cliff Joslyn pioneered hypergraph analysis projects at PNNL, and former PNNL computational biologist Jason Wendler helped Purvine and McDermott start the project applying hypergraphs to viral infection response.