November 18, 2024
Report

Proteomics Analysis of Human Contaminant Proteins

Abstract

Complete characterization of unknowns via proteomics remains challenging. There exist regions of mass spectrometry-based proteomics data where empirical measurements are not attributed to peptides, and/or sequenced peptides from mass spectra are not attributed to any source. These uncharacterized regions are known as the “dark” proteome. Many proteomics tools rely on some a priori knowledge of sample composition; few tools allow for investigation of unknowns without relying on composition assumptions. Further, the potential low abundance of minor traces in these uncharacterized regions can make elucidation of the “dark” proteome challenging. Herein, we describe the development and evaluation of approaches to study the “dark” proteome and move towards an untargeted approach for more complete characterization, namely by studying minor human protein traces in non-human samples and combining that approach with non-human source organism identification without relying on assumptions. Human protein markers, in the form of genetically variant peptides, have been extensively examined in a variety of human matrices, including blood, plasma, and hair, but have yet to be investigated in non-human samples, such as cell cultures, as human contaminant traces. Genetically variant peptides are those that are found in proteins carrying single nucleotide polymorphisms. In this work, we aimed to (1) investigate the feasibility of detecting human contaminant genetically variant peptides (GVPs) in a diverse set of non-human organisms using public proteomics data and a computational pipeline, as well as to (2) develop a combined capability for untargeted source organism characterization and GVP detection. To our knowledge, this is the first report of applying these approaches towards a more complete proteomic characterization of unknowns. We successfully demonstrate the feasibility of broad human contaminant GVP detection in proteomics data, develop a better understanding of GVP detectability, characterize the sample-to-sample variability in GVP detection, and identify a core set of GVPs that can potentially be used as markers indicative of the human contaminant traces portion of the “dark” proteome. Further, we developed and evaluated a combined pipeline, MARLOWE-GVP, that enables both untargeted source organism characterization and GVP detection. We show high accuracy of correct source organism characterization and high degree of similarity of human contaminant GVP detection compared to the conventional approach. Success on both these efforts have allowed us to advance our understanding and characterization of the “dark” proteome.

Published: November 18, 2024

Citation

Chu F., A. Lin, D.H. Lewis, S.C. Jenson, R.W. Seymour, E.D. Merkley, and K.L. Wahl. 2024. Proteomics Analysis of Human Contaminant Proteins Richland, WA: Pacific Northwest National Laboratory.

Research topics