September 20, 2024
Journal Article

Predicting metabolic modules in incomplete bacterial genomes with MetaPathPredict

Abstract

The reconstruction of complete microbial metabolic pathways using ‘omics data from environmental samples remains challenging. Computational pipelines for pathway reconstruction that utilize machine learning methods to predict the presence or absence of KEGG modules in incomplete genomes are lacking. Here, we present MetaPathPredict, a software tool that incorporates machine learning models to predict the presence of complete KEGG modules within bacterial genomic datasets. Using gene annotation data and information from KEGG module databases, MetaPathPredict employs neural network and XGBoost stacked ensemble models to reconstruct and predict the presence of KEGG modules in a genome. MetaPathPredict can be used as a command line tool or as an R package, and both options are designed to be run locally or on a compute cluster. In our benchmarks, MetaPathPredict makes robust predictions of KEGG module presence within highly incomplete genomes.

Published: September 20, 2024

Citation

Geller-Mcgrath D.E., K. Konwar, V. Edgcomb, M. Pachiadaki, J. Roddy, T. Wheeler, and J.E. McDermott. 2024. Predicting metabolic modules in incomplete bacterial genomes with MetaPathPredict. eLife 13, no. _:Art No. e85749. PNNL-SA-197656. doi:10.7554/eLife.85749

Research topics