January 1, 2021
Journal Article

A review of imputation strategies for isobaric labeling-based shotgun proteomics

Abstract

The throughput efficiency and increased depth of coverage provided by isobaric-labeled proteomics measurements have led to increased usage of these techniques. However, the structure of missing data is uniquely different than unlabeled studies. In this review, we compare the efficacy of nine imputation methods on a CPTAC proteomics iTRAQ dataset. Imputation methods were evaluated with regard to accuracy, variability, statistical hypothesis test inference and run time over datasets consisting of varying number of iTRAQ plexes and percentages of missing data. In general, expectation maximization and random forest imputation methods yielded the best performances, and constant-based methods performed poorly consistently across all dataset sizes and percentages of missing values. For datasets with small sample sizes and higher percentages of missing data, results indicate that statistical inference with no imputation may be preferable. Based on the findings in this review, there are core imputation methods that perform higher for isobaric-labeled proteomics data, but great care and consideration as to whether imputation should be used should be given for datasets comprised of a small number of samples, as well as to factors such as computational time and reproducibility of imputation values.

Revised: January 18, 2021 | Published: January 1, 2021

Citation

Bramer L.M., J. Irvahn, P.D. Piehowski, K.D. Rodland, and B.M. Webb-Robertson. 2021. A review of imputation strategies for isobaric labeling-based shotgun proteomics. Journal of Proteome Research 20, no. 1:1-13. PNNL-SA-149520. doi:10.1021/acs.jproteome.0c00123