Biological Sciences Division
Comprehensive Proteomic Dataset of Ovarian Tumor Samples Released
One of the largest publicly available datasets for cancer researchers
Results: A collaboration between researchers from Pacific Northwest National Laboratory (PNNL) and Johns Hopkins University has produced a comprehensive dataset of the proteomic analyses of high-grade serous ovarian tumor samples. Such tumors are the most common cancer of the ovary. The dataset provides researchers the opportunity to develop and test novel proteogenomic integration tools and algorithms to extend their understanding of cancer biology and how genomic through proteomic changes interact to drive cancer-information that can help identify clinical targets for treatment.
The dataset was released June 16 by the National Cancer Institute (NCI) Clinical Proteomic Tumor Analysis Consortium (CPTAC). This is the one of the largest public datasets covering the proteome, phosphoproteome, and glycoproteome with complementary deep genomic sequencing data on the same tumor. The datasets and corresponding metadata are publicly available at the CPTAC Data Portal. One of CPTAC's five large centers is located at PNNL.
The PNNL center is led by Battelle Fellow Richard D. Smith and Laboratory Fellow Karin Rodland as co-principal investigators providing complementary expertise in proteomic technologies and cancer biology, respectively.
Why It Matters: "Although cancer has always been considered a genomic disease, the instructions in the genome are executed by proteins," Rodland said. "Protein modifications like phosphorylation and glycosylation, which can't be determined directly from the genome, are key factors regulating protein function. This comprehensive database of observed proteins, phosphoproteins, and glycoproteins will provide a vital ingredient that is absolutely necessary for a full understanding of the processes driving tumor behavior and clinical outcomes."
The samples previously were genomically analyzed by The Cancer Genome Atlas (TCGA). Integration of proteomic and phosphoproteomic data with genome-level information from the TCGA has the potential to identify those genetic and transcriptional changes most robustly linked to clinical outcomes, pointing the way to new prognostic markers and potentially novel therapeutic targets.
TCGA is a joint effort of the NCI and the National Human Genome Research Institute (NHGRI) to accelerate understanding of the molecular basis of cancer through application of genome analysis technologies, including large-scale genome sequencing.
Methods: The teams analyzed 174 TCGA samples, 32 of which were analyzed by both research groups. Investigators at Johns Hopkins University analyzed the glycoproteome and proteome, while PNNL's investigators examined the phosphoproteome and proteome.
The ovarian dataset is the third large-scale data release by CPTAC investigators of tumors comprehensively characterized through deep proteomic analysis that were previously genomically analyzed by TCGA (colorectal and breast cancer datasets were released in 2013 and 2014, respectively). More information about how TCGA and CPTAC partner to improve the ability to diagnose, treat, and prevent cancer can be found here.
Sponsor: The database is sponsored by the National Cancer Institute. Some of the work was performed at EMSL, a U.S. Department of Energy, Office of Biological and Environmental Research-sponsored national scientific user facility located at PNNL.
Research Team: The PNNL team includes Tao Liu (global and targeted proteomics), Sam Payne (bioinformatics), Jason McDermott (computational biology), Vlad Petyuk (biostatistics), Feng Yang (phosphoproteomics), Ron Moore (mass spectrometry), and Marina Gritsenko (sample processing).