Artificial intelligence for the discovery and engineering of microbial biosynthetic gene clusters


Date
Jan 24, 2024 10:47 PM
Event
Projects 2024

Description: This project will expand the popular antiSMASH pipeline for the annotation of biosynthetic gene clusters (BGCs) in microbial genomes, with the aim of facilitating the engineering of novel biosynthetic pathways in heterologous expression systems. We will apply statistical machine learning / AI to a growing body of bacterial genomes to (a) improve the identification of gene cluster boundaries; (b) refine the delimitation of co-evolving functional “modules” (sub-clusters) within BGCs; (c) define assembly rules for the design of hybrid gene clusters and for the successful domain-shuffling of multi-domain enzymes that often are responsible for the core biosynthesis of natural products.

The project will use tools from sequential optimisation to discovery engineering rules that help create novel functional gene clusters from the components identified in the genomes.

Significance: BGCs encode the enormous biosynthetic capacity of microbes and form the basis of the biotechnological exploitation for a wide range of purposes, in particular as powerful drugs (antibiotics, anticancer). The ML/AI tools developed here will facilitate the detection and engineering of these gene clusters to enable the rapid diversification of natural product libraries by combinatorial assembly. This requires the reliable identification of design rules, which will be enabled by larger datasets and the application of recent advances in AI technologies.

Impact: These tools will be of great interest to the vibrant community of microbiologists and synthetic biologists that is currently using the antiSMASH toolbox. At the same time, it will allow advanced training of the next generation of computational scientists, working at the intersection of (Bio)Science and AI/ML.

Deliverables: Each of the work packages (a–c) of this project is foreseen to result in a peer-reviewed publication. All developments will also be implemented in user-friendly software and will be made available as stand-alone tools or as part of the antiSMASH pipeline.

References

  1. Medema MH, Blin K, Cimermancic P, de Jager V, Zakrzewski P, Fischbach MA, Weber T, Takano E, Breitling R (2011): antiSMASH: rapid identification, annotation and analysis of secondary metabolite biosynthesis gene clusters in bacterial and fungal genome sequences. Nucl. Acids Res. 39:W339–W346. [This paper and regular updates published subsequently have been cited more than 5,000 times and have become the industry standard in the field. – https://antismash.secondarymetabolites.org/#!/start]

  2. Medema MH, Kottmann R, Yilmaz P, Cummings M, [146 co-authors in alphabetical order], Breitling R, Takano E, Glöckner FO (2015): Minimum Information about a Biosynthetic Gene cluster (MIBiG). Nature Chem. Biol. 11:625–631. [This community publication initiated by the supervisory team provides the foundational database for this project. – https://mibig.secondarymetabolites.org/]

  3. Del Carratore F, Zych K, Cummings M, Takano E, Medema MH, Breitling R (2019): Computational identification of co-evolving multi-gene modules in microbial biosynthetic gene clusters. Communications Biology 2:83. [This proof-of-concept study established some of the statistical methodology on which this project will be building.]

  4. Moss H, Beck D, Gonzalez J, Leslie D S, Rayson P (2020): BOSS: Bayesian Optimization over String Spaces. Advances in Neural Information Processing Systems 33.

Additional information

Additional information in Find A PhD

Rainer Breitling
Rainer Breitling
Professor of Systems Biology, Chemical Biology and Biological Chemistry
Mauricio A Álvarez
Mauricio A Álvarez
Senior Lecturer in Machine Learning
Eriko Takano
Eriko Takano
Professor of Synthetic Biology, Chemical Biology and Biological Chemistry