Evolutionary investigations of the biosynthetic diversity in the skin microbiome using lsaBGC


Bacterial secondary metabolites, synthesized by enzymes encoded in biosynthetic gene clusters (BGCs), can underlie microbiome homeostasis and serve as commercialized products, which have historically been mined from a select group of taxa. While evolutionary approaches have proven beneficial for prioritizing BGCs for experimental characterization efforts to uncover new natural products, dedicated bioinformatics tools designed for comparative and evolutionary analysis of BGCs within focal taxa are limited. We thus developed lineage specific analysis of BGCs (lsaBGC; https://github.com/Kalan-Lab/lsaBGC) to aid exploration of microdiversity and evolutionary trends across homologous groupings of BGCs, gene cluster families (GCFs), in any bacterial taxa of interest. lsaBGC enables rapid and direct identification of GCFs in genomes, calculates evolutionary statistics and conservation for BGC genes, and builds a framework to allow for base resolution mining of novel variants through metagenomic exploration. Through application of the suite to four genera commonly found in skin microbiomes, we uncover new insights into the evolution and diversity of their BGCs. We show that the BGC of the virulence-associated carotenoid staphyloxanthin in Staphylococcus aureus is ubiquitous across the genus Staphylococcus. While one GCF encoding the biosynthesis of staphyloxanthin showcases evidence for plasmid-mediated horizontal gene transfer (HGT) between species, another GCF appears to be transmitted vertically amongst a sub-clade of skin-associated Staphylococcus. Further, the latter GCF, which is well conserved in S. aureus, has been lost in most Staphylococcus epidermidis, which is the most common Staphylococcus species on human skin and is also regarded as a commensal. We also identify thousands of novel single-nucleotide variants (SNVs) within BGCs from the Corynebacterium tuberculostearicum sp. complex, a narrow, multi-species clade that features the most prevalent Corynebacterium in healthy skin microbiomes. Although novel SNVs were approximately 10 times as likely to correspond to synonymous changes when located in the top five percentile of conserved sites, lsaBGC identified SNVs that defied this trend and are predicted to underlie amino acid changes within functionally key enzymatic domains. Ultimately, beyond supporting evolutionary investigations of BGCs, lsaBGC also provides important functionalities to aid efforts for the discovery or directed modification of natural products.

Microbial Genomics