At US HUPO, New PNNL Software Tools Get an Airing
An estimated 20,300 genes in the human genome encode proteins. The number of proteins themselves, as intact proteoforms, could be as high as one billion. That vast number makes the functional protein architecture of humans - the proteome - much harder to map than the genome.
Yet mapping the proteome is essential because understanding the activities and functions of proteins could speed the development of protocols for diagnosis, treatment, and prevention of disease.
Internationally, the Human Proteome Organization (HUPO) coordinates this kind of research. The United States chapter, US HUPO, is hosting its 13th annual conference this week (March 19-22). It features presentations, lightning talks, industry lunch seminars, short courses, workshops, and poster sessions. Among the presenters is PNNL's Sam Payne, an Integrative Omics scientist and team lead who develops algorithms for analyzing proteomics data.
Proteomics data are collected by mass spectrometry analytical strategies designed to reveal the function and activity of proteins, in part by accurately measuring a protein's charge, mass, and weight.
At US HUPO conference in San Diego this week "there are multiple broad topics," said Payne - many of them with an emphasis on biology and on proteomics related to cancer, heart disease, and neurological disorders. But he is part of the more technological side of the proceedings.
As a backdrop to US HUPO, said Payne, "PNNL has a very long history in leading top-down analysis" in both instruments and informatics.
His talk, on the afternoon of March 21, comes during a session on top-down analysis of protein complexes. This approach analyzes each protein while it is intact, instead of relying on measuring a protein's smaller parts.
Payne's theme is software that keeps up with the rapid evolution of mass spectrometry instrumentation. "We are trying to made flexible software so that as the instruments change it takes less effort for us to adapt," he said.
His presentation introduces an open-source suite of software for top-down proteomics analysis called Informed-Proteomics. It offers substantial improvements in feature finding, database search algorithms, and semi-automated learning methods. Funding came from the U.S. Department of Energy and from the National Institute of General Medical Sciences.
In the traditional and more common bottom-up process, proteins are digested, sliced into peptides, and then reassembled. Researchers get a higher number of identifications and faster throughput. But results related to protein isoforms and species can be inconclusive. In addition, peptides do not always map accurately to a single human gene.
"Studying a protein in its native structure is important," said Payne, since so much more information about the protein is preserved. "But there are technical challenges getting to the scale you want to be." The spectra derived from top-down methods are much more complex, and require new software tools and novel algorithms to meet what he called the "hugely challenging" idea of measuring all the proteins in a cell.
For one, in top-down analyses the size of intact proteins means the signal after ionization is spread out over many dimensions. For another, the "search space" of potential proteoforms is very large since the combinatory universe of proteins can number up to a billion.
"The challenge with top-down is that what you look for is extraordinarily large," said Payne - and that requires the right mathematics "to organize an efficient way to search."
How do you interpret all that top-down data?"There are very unique challenges to studying the protein as a whole," he said, especially if the instruments acquiring the data keep changing. "It means you're always dealing with a moving target."