Classification is a common task used to analyze 'omics data to build predictive models of biomedical outcomes. Despite the prevalence of case-control studies, the number of classification methods available to analyze data generated by such studies is extremely limited. Conditional logistic regression is the most commonly used technique, but the associated modeling assumptions limit its ability to identify a large class of sufficiently complicated 'omic signatures. We show that by applying a simple data preprocessing step, any linear or non-linear classification algorithm can be used to model case-control data. We demonstrate on case-control simulated data both that the classification and variable selection accuracy of each method is improved after applying this processing step. Finally, we demonstrated the impact of conditional classification algorithms on a large cohort study of children with type 1 diabetes.
Revised: August 13, 2019 |
Published: January 1, 2019
Citation
Stanfill B.A., S.M. Reehl, L.M. Bramer, E.S. Nakayasu, S.S. Rich, T.O. Metz, and M. Rewers, et al. 2019.Extending Classification Algorithms to Case-Control Studies.Biomedical Engineering and Computational Biology 10.PNNL-SA-135302.doi:10.1177/1179597219858954