AbstractTop-down proteomics is the analysis of proteins in their intact form without proteolysis, thus preserving valuable information about post-translational modifications, isoforms, and proteolytic processing. However, it is still a developing field due to limitations in the instrumentation, difficulties with interpretation of complex mass spectra, and a lack of well-established quantification approaches. TopPIC is one of the popular tools for proteoform identification. We extended its capabilities into label-free proteoform quantification by developing a companion R package (TopPICR). Key steps in the TopPICR pipeline include filtering identifications, inferring a minimal set of protein accessions explaining the observed sequences, aligning retention times, recalibrating measured masses, clustering features across datasets, and finally compiling feature intensities using the match-between-runs approach. The output of the pipeline is an MSnSet object which makes downstream data analysis seamlessly compatible with packages from the Bioconductor project. It also provides the capability for visualizing proteoforms within the context of the parent protein sequence. The functionality of TopPICR is demonstrated on top-down LC-MS/MS datasets of 10 human-in-mouse xenografts of luminal and basal breast tumor samples.
Published: August 18, 2023