Bacterial genetic diversity is often described solely using base-pair changes despite a wide variety of other mutation types likely being major contributors. Tandem duplication/amplifications are thought to be widespread among bacteria but due to their often-intractable size and instability, comprehensive studies of these mutations are rare. We define a methodology to investigate amplifications in bacterial genomes based on read depth of genome sequence data as a proxy for copy number. We demonstrate the approach with Bordetella pertussis, whose insertion sequence element-rich genome provides extensive scope for amplifications to occur. Analysis of data for 2430 B. pertussis isolates identified 272 putative amplifications, of which 94 % were located at 11 hotspot loci. We demonstrate limited phylogenetic connection for the occurrence of amplifications, suggesting unstable and sporadic characteristics. Genome instability was further described in vitro using long-read sequencing via the Nanopore platform, which revealed that clonally derived laboratory cultures produced heterogenous populations rapidly. We extended this research to analyse a population of 1000 isolates of another important pathogen, Mycobacterium tuberculosis. We found 590 amplifications in M. tuberculosis, and like B. pertussis, these occurred primarily at hotspots. Genes amplified in B. pertussis include those involved in motility and respiration, whilst in M. tuberuclosis, functions included intracellular growth and regulation of virulence. Using publicly available short-read data we predicted previously unrecognized, large amplifications in B. pertussis and M. tuberculosis. This reveals the unrecognized and dynamic genetic diversity of B. pertussis and M. tuberculosis, highlighting the need for a more holistic understanding of bacterial genetics.
|Number of pages||15|
|Publication status||Published - 10 Feb 2022|