November 19, 2021

Designing Microbe Factories for Sustainable Chemicals

Machine learning, Bayesian inference, and metabolic modeling help design biofuel-producing yeast

Machine learning yeast

Combining machine learning with Bayesian inference and metabolic modeling helps design new yeast capable of producing biofuels.

(Illustration by Nathan Johnson | Pacific Northwest National Laboratory)

The science is clear: fossil fuels are harmful to the environment. So why is it so difficult for us to stop using them? Economic reasons are at least part of the answer. From our energy grid to the manufacturing of certain textiles and other products, many parts of our society are built to use fossil fuels. Transitioning away will come at some cost.

But what if we could produce an economically attractive replacement for fossil fuels? New research from Pacific Northwest National Laboratory (PNNL) suggests a way to do just that. Biologists have devised a way to engineer yeast to produce itaconic acid—a valuable commodity chemical—using data integration and supercomputing power as a guide.

ACS Synthetic Biology November 2021 cover
This research was featured on the cover of the November 2021 issue of ACS Synthetic Biology. (Illustration by Nathan Johnson | Pacific Northwest National Laboratory)

Creating microbial factories using metabolic modeling 

Itaconic acid has enormous potential as a renewable chemical building block. It could substitute for some fossil-fuel-derived products. In 2004, it was named one of the “top value added chemicals from biomass” in a report by the Department of Energy (DOE). Seeing the potential of itaconic acid as a petrochemical replacement, data scientist Neeraj Kumar set out to inexpensively produce it using microbes.

Kumar and colleagues had previously developed a way to calculate how engineered changes in microbes could affect their metabolism. Building upon this idea, Kumar wanted to see if he could use these metabolic predictions to engineer yeast to produce high amounts of itaconic acid.

“We needed to identify what genes in the itaconic acid production pathway we could alter so the yeast could make greater quantities of the chemical,” said Kumar. “The challenge was finding the balance between cellular health and bioproduction.”

Photo of Neeraj Kumar
Data scientist Neeraj Kumar, pictured here, combines metabolic modeling and omics data integration with machine learning to help guide the design of yeast with special capabilities. (Photo by Andrea Starr | Pacific Northwest National Laboratory)


Itaconic acid is naturally produced by just a few fungi. PNNL scientist Ziyu Dai borrowed genes from other fungi to give Yarrowia lipolytica the ability to produce the chemical. Biologist Erin Bredeweg had been working on this modified yeast, containing several different gene combinations, when Kumar approached her to collaborate. Bredeweg and her colleagues had created a metabolic and proteomic profile of the modified yeast and passed the data to Kumar.

Taking cues from the Design-Build-Test-Learn strategy, Kumar and his research associate Andrew McNaughton used machine learning to examine this profile to see what nonessential genes could be removed from the yeast, or what helpful ones could be added, to increase the production of itaconic acid.

Once they selected the genes to "design" the organism, it was time to build. Bredeweg created different versions of the yeast with genes added or removed based on Kumar and McNaughton’s computational predictions. She then tested the different yeasts to see if carbon flow toward itaconic acid production pathways was affected. Machine learning analysis of the data from RNA sequencing indicated that the computational predictions matched the experimental outcome and further detailed gene predictions for future analysis.

“Though this research is still in the early stages, it is exciting to see its potential,” said Bredeweg. “Machine learning and causal inference can uncover new ways of thinking about how a complex cell system, like yeast, could respond to individual gene changes, beyond what is possible from metabolic modeling alone.”

Photo of Erin Bredeweg
Biologist Erin Bredeweg shows off different cultures of the Yarrowia lipolytica yeast. (Photo by Andrea Starr | Pacific Northwest National Laboratory)

Machine learning and multiomics datasets expand the potential of metabolic modeling

Yeasts and other microbes are commonly used to produce useful chemicals. While it is easy to get them to produce some chemicals in high yields, like ethanol, other chemicals may provide more of a challenge. Kumar hopes that this system of combining machine learning with metabolic modeling and multiomics datasets will help overcome these production challenges.

“Though we still need more testing on this model, there is an amazing potential to expand this computationally guided bioengineering to other systems,” said Kumar. “This strategy could open up a new era in biosystem design for the production of eco-friendly chemicals.”

James Manzer, Jeremy Zucker, Meagan Burnet, Ernesto S. Nakayasu, and William Chrisler from PNNL’s Biological Sciences Division, Kyle R. Pomraning from PNNL’s Energy Processes and Materials Division, Eric D. Merkley from PNNL’s Signature Sciences and Technology Division, Nathalie Munoz and Scott E. Baker from the Environmental Molecular Sciences Laboratory (EMSL) located at PNNL, and Peter St. John from the National Renewable Energy Laboratory also contributed to this work. This research was supported by the Laboratory Directed Research and Development program at PNNL. Researchers from the National Renewable Energy Laboratory contributed to this study. Computing resources and metabolomics and proteomic experiments were supported by the Intramural program at EMSL.


About PNNL

Pacific Northwest National Laboratory draws on its distinguishing strengths in chemistry, Earth sciences, biology and data science to advance scientific knowledge and address challenges in sustainable energy and national security. Founded in 1965, PNNL is operated by Battelle for the Department of Energy’s Office of Science, which is the single largest supporter of basic research in the physical sciences in the United States. DOE’s Office of Science is working to address some of the most pressing challenges of our time. For more information, visit For more information on PNNL, visit PNNL's News Center. Follow us on Twitter, Facebook, LinkedIn and Instagram.