Genomic comparison of E3 and NRRL23338
Re-sequencing of the whole genome of E3 was performed using an Illumina Hiseq2000 platform resulting in approximately 1.2×107raw reads of 150 bp. The overall features of the E3 genome are highly similar to NRRL23338 (Table 1) (Oliynyk et al., 2007). The length of E3 genome is 8,181,083 bp with a GC content of 71.81%. 7,669 coding sequences were predicted bioinformatically, including 7607 protein-coding sequences, 12 rRNA genes and 50 tRNA genes. E3 presents an identical ncRNA profile to NRRL23338, which indicated a relatively conserved translation process after mutagenesis.
The E3 genome is 31.72 kb shorter than that of NRRL23338 (Oliynyk et al., 2007). However, 16 coding sequences are multiplied in E3 relative to NRRL23338 (Table S1). 7 out of the 16 multi-copy genes encode transposases, e.g., 10 copies of SACE_2214, 8 copies of SACE_2314 and 9 copies of SACE_2316, which scatter over the genome of E3. Some transposases (e.g., IS 30 and IS 256) can alter the expression levels of their neighboring genes by either forming new strong promoter or transcriptional termination signals (Nagy & Chandler, 2004). Therefore, the rearrangement of transposase genes might be one of the factors leading to transcriptional changes of some genes. It may also indicate a more active DNA transposition in E3. Furthermore, highly active transposable elements with increased copy numbers might be useful tools for heterologous expression of gene clusters of other secondary metabolites in S. erythraea (Kiss, Szabo, & Olasz, 2003). Interestingly, the copy numbers of genes in Table S1 other than coding for transposases are all doubled in E3. One subgroup of the doubled genes includes acetyl-CoA synthetase gene (SACE_0337) and proline-specific permease (SACE_4252), which are associated with supplying precursors for the synthesis of erythromycin. A second subgroup of doubled genes code for 3-phenylpropionate dioxygenase ferredoxin subunit (SACE_4486), superoxide dismutase (SACE_0619) and aldehyde dehydrogenase (SACE_2377), whose functions are in association with regulating the intracellular redox status. A putative transcriptional regulator (SACE_2715) are also doubled.
On the other hand, in total 31 kb nucleotides which distribute inside 56 coding sequences are deleted in E3 (Table S2). The deletion herein was defined as the missing of fragments larger than 100 bp. The functions of these 56 damaged genes were considered blocked due to the missing nucleotides. It’s thus reasonable to assume that the deleted sequences do not play important roles in the biosynthesis of erythromycin. Except for deletions in sequences with unknown functions, quite large amount of deletions are observed in genes responsible for replication and repair. A long fragment of SACE_0239~0248 is entirely deleted and was predicted as a prophage (Y. Li et al., 2013). Deletions in five genes (SACE_2777/2787/2907/4254/5157) coding for substrate transporters implied that the substrate profile of E3 was likely different from that of NRRL23338. Another complete deletion is observed at SACE_4326~4331 which is linked to terpenoids metabolism. However, the relation between terpenoids and erythromycin remains poorly understood.
In E3, we observed a total of 255 single nucleotide polymorphism (SNP) (Table S3). 160 SNPs locate in the core region of S. erythraeagenome (Oliynyk et al., 2007). 128 nonsynonymous/nonsense/stop-mutation SNPs are located inside sequences of 118 coding regions. The protein conformation could change owing to the nonsynonymous SNPs, resulting in a modified activity of that protein (Ferro et al., 2017; Nakken, Alseth, & Rognes, 2007). 51 mutations occur in intergenic regions of 44 coding sequences, which probably affect the processing and segmental stability of a transcript containing multiple coding regions (Smolke & Keasling, 2002). To include underlying mechanisms by which the biosynthesis of erythromycin in E3 was enhanced, the functional analysis of genes with SNPs was performed (Fig. 2). Top five categories in terms of the amount of SNPs were [K] transcription, [E] amino acid metabolism and transport, [C] energy production and conversion, [G] carbohydrate metabolism and transport, and [Q] secondary metabolites biosynthesis, transport and catabolism. By mapping genes with nonsynonymous/intergenic-regions SNPs to metabolic pathways, several key nodes/pathways for the biosynthesis of erythromycin were identified (Figure S1). In general, the key nodes/pathways were tightly related to the biosynthesis of erythromycin by supplying precursors and cofactors or by signal transduction (Z. Xu, You, Tang, Zhou, & Ye, 2019). The key nodes/pathways were [a] isocitrate to 2-oxoglutarate or glyoxylate shunt, [b] oxidative phosphorylation, [c] 2-oxoglutarate to glutamate, [d] lipid metabolism, [e] S-Adenosylmethionine (SAM) metabolism, [f] valine, leucine, isoleucine metabolism, [g] propanoate metabolism, [h] pyruvate synthesis, [i] thiomine metabolism, [j] tryptophan/tyrosine metabolism and [k] pentose phosphate pathway (PPP)/metabolism of phosphoribosyl diphosphate (PRPP). Among genes with nonsynonymous or intergenic SNPs, genes encoding isocitrate lyase (SACE_1449), isocitrate dehydrogenase (SACE_6636) or 2-oxoglutarate dehydrogenase subunits (SACE_1638&6385) surround the node of 2-oxoglutarate. Under high-erythromycin producing conditions, higher carbon flux through isocitrate was drained into succinyl-CoA via 2-oxoglutarate compared to the flux into the glyoxylate shunt (Hong, Huang, Chu, Zhuang, & Zhang, 2016). We can assume that in E3 more carbon flux through isocitrate node flows towards succinyl-CoA and subsequently to methylmalonyl-CoA. The re-direction of carbon flow at the isocitrate node is likely accomplished by varying enzymatic activities owing to nonsynonymous SNPs. Another important SNP is the one in the intergenic region of SACE_3400, which encodes acetyl-CoA carboxylase (Acc) subunit. 7 genes (SACE_0934/1282/1341/1764/2786/4937/7125) involved in PPP or downstream purine metabolism are mutated, which indicated the importance of PPP or purine metabolism to the biosynthesis of erythromycin. There are also two nonsynonymous SNPs in erythromycin BGC. One nonsynonymous mutation occur inside SACE_0718 (eryCVI ) and one intergenic mutation is observed in front of SACE_0720 (eryBIV ). These two genes were also mutated in another high erythromycin producer, Px (Peano et al., 2012). Given the necessity of eryCVI and eryBIV for the biosynthesis of erythromycin, the two mutations are very likely to enhance the total enzymatic activities of their respective coding products (Summers et al., 1997).
Apart from the nonsynonymous mutations, 6 CDSs present nonsense SNPs, which likely blocked the transcription of the genes (Table 2). Two polyketide synthases (SACE_0019 & SACE_2875) are affected by nonsense mutations. SACE_0019 locates in pfa cluster, which appears to govern the biosynthesis of polyunsaturated fatty acids such as eicosapentaenoic acid (Oliynyk et al., 2007). SACE_2875 is part of the pks3 gene cluster involved in the production of an aromatic polyketide antibiotic with low molecular weight. As a result, synthesis of these two kinds of secondary metabolites should be inactivated or diminished by nonsense mutations. The biosynthesis of erythromycin also depends on polyketide synthases (PKS), therefore the inactivation or diminishment ofpks- related synthesis pathways may weaken precursor competitions between the synthesis of erythromycin and other polyketides.