Genomic comparison of E3 and NRRL23338
Re-sequencing of the whole genome of E3 was performed using an Illumina
Hiseq2000 platform resulting in approximately 1.2×107raw reads of 150 bp. The overall features of the E3 genome are highly
similar to NRRL23338 (Table 1) (Oliynyk et al., 2007). The length of E3
genome is 8,181,083 bp with a GC content of 71.81%. 7,669 coding
sequences were predicted bioinformatically, including 7607
protein-coding sequences, 12 rRNA genes and 50 tRNA genes. E3 presents
an identical ncRNA profile to NRRL23338, which indicated a relatively
conserved translation process after mutagenesis.
The E3 genome is 31.72 kb shorter than that of NRRL23338 (Oliynyk et
al., 2007). However, 16 coding sequences are multiplied in E3 relative
to NRRL23338 (Table S1). 7 out of the 16 multi-copy genes encode
transposases, e.g., 10 copies of SACE_2214, 8 copies of SACE_2314 and
9 copies of SACE_2316, which scatter over the genome of E3. Some
transposases (e.g., IS 30 and IS 256) can alter the expression levels of
their neighboring genes by either forming new strong promoter or
transcriptional termination signals (Nagy & Chandler, 2004). Therefore,
the rearrangement of transposase genes might be one of the factors
leading to transcriptional changes of some genes. It may also indicate a
more active DNA transposition in E3. Furthermore, highly active
transposable elements with increased copy numbers might be useful tools
for heterologous expression of gene clusters of other secondary
metabolites in S. erythraea (Kiss, Szabo, & Olasz, 2003).
Interestingly, the copy numbers of genes in Table S1 other than coding
for transposases are all doubled in E3. One subgroup of the doubled
genes includes acetyl-CoA synthetase gene (SACE_0337) and
proline-specific permease (SACE_4252), which are associated with
supplying precursors for the synthesis of erythromycin. A second
subgroup of doubled genes code for 3-phenylpropionate dioxygenase
ferredoxin subunit (SACE_4486), superoxide dismutase (SACE_0619) and
aldehyde dehydrogenase (SACE_2377), whose functions are in association
with regulating the intracellular redox status. A putative
transcriptional regulator (SACE_2715) are also doubled.
On the other hand, in total 31 kb nucleotides which distribute inside 56
coding sequences are deleted in E3 (Table S2). The deletion herein was
defined as the missing of fragments larger than 100 bp. The functions of
these 56 damaged genes were considered blocked due to the missing
nucleotides. It’s thus reasonable to assume that the deleted sequences
do not play important roles in the biosynthesis of erythromycin. Except
for deletions in sequences with unknown functions, quite large amount of
deletions are observed in genes responsible for replication and repair.
A long fragment of SACE_0239~0248 is entirely deleted
and was predicted as a prophage (Y. Li et al., 2013). Deletions in five
genes (SACE_2777/2787/2907/4254/5157) coding for substrate transporters
implied that the substrate profile of E3 was likely different from that
of NRRL23338. Another complete deletion is observed at
SACE_4326~4331 which is linked to terpenoids
metabolism. However, the relation between terpenoids and erythromycin
remains poorly understood.
In E3, we observed a total of 255 single nucleotide polymorphism (SNP)
(Table S3). 160 SNPs locate in the core region of S. erythraeagenome (Oliynyk et al., 2007). 128 nonsynonymous/nonsense/stop-mutation
SNPs are located inside sequences of 118 coding regions. The protein
conformation could change owing to the
nonsynonymous
SNPs, resulting in a modified activity of that protein (Ferro et al.,
2017; Nakken, Alseth, & Rognes, 2007). 51 mutations occur in intergenic
regions of 44 coding sequences, which probably affect the processing and
segmental stability of a transcript containing multiple coding regions
(Smolke & Keasling, 2002). To include underlying mechanisms by which
the biosynthesis of erythromycin in E3 was enhanced, the functional
analysis of genes with SNPs was performed (Fig. 2). Top five categories
in terms of the amount of SNPs were [K] transcription, [E] amino
acid metabolism and transport, [C] energy production and conversion,
[G] carbohydrate metabolism and transport, and [Q] secondary
metabolites biosynthesis, transport and catabolism. By mapping genes
with nonsynonymous/intergenic-regions SNPs to metabolic pathways,
several key nodes/pathways for the biosynthesis of erythromycin were
identified (Figure S1). In general, the key nodes/pathways were tightly
related to the biosynthesis of erythromycin by supplying precursors and
cofactors or by signal transduction (Z. Xu, You, Tang, Zhou, & Ye,
2019). The key nodes/pathways were [a] isocitrate to 2-oxoglutarate
or glyoxylate shunt, [b] oxidative phosphorylation, [c]
2-oxoglutarate
to glutamate, [d] lipid metabolism, [e] S-Adenosylmethionine
(SAM) metabolism, [f] valine, leucine, isoleucine metabolism,
[g] propanoate metabolism, [h] pyruvate synthesis, [i]
thiomine metabolism, [j] tryptophan/tyrosine metabolism and [k]
pentose phosphate pathway
(PPP)/metabolism
of phosphoribosyl diphosphate (PRPP). Among genes with nonsynonymous or
intergenic SNPs, genes encoding isocitrate lyase (SACE_1449),
isocitrate dehydrogenase (SACE_6636) or 2-oxoglutarate dehydrogenase
subunits (SACE_1638&6385) surround the node of 2-oxoglutarate. Under
high-erythromycin producing conditions, higher carbon flux through
isocitrate was drained into succinyl-CoA via 2-oxoglutarate compared to
the flux into the glyoxylate shunt (Hong, Huang, Chu, Zhuang, & Zhang,
2016). We can assume that in E3 more carbon flux through isocitrate node
flows towards succinyl-CoA and subsequently to methylmalonyl-CoA. The
re-direction of carbon flow at the isocitrate node is likely
accomplished by varying enzymatic activities owing to nonsynonymous
SNPs. Another important SNP is the one in the intergenic region of
SACE_3400, which encodes acetyl-CoA carboxylase (Acc) subunit. 7 genes
(SACE_0934/1282/1341/1764/2786/4937/7125) involved in PPP or downstream
purine metabolism are mutated, which indicated the importance of PPP or
purine metabolism to the biosynthesis of erythromycin. There are also
two nonsynonymous SNPs in erythromycin BGC. One nonsynonymous mutation
occur inside SACE_0718 (eryCVI ) and one intergenic mutation is
observed in front of SACE_0720 (eryBIV ). These two genes were
also mutated in another high erythromycin producer, Px (Peano et al.,
2012). Given the necessity of eryCVI and eryBIV for the
biosynthesis of erythromycin, the two mutations are very likely to
enhance the total enzymatic activities of their respective coding
products (Summers et al., 1997).
Apart from the nonsynonymous mutations, 6 CDSs present nonsense SNPs,
which likely blocked the transcription of the genes (Table 2). Two
polyketide synthases (SACE_0019 & SACE_2875) are affected by nonsense
mutations. SACE_0019 locates in pfa cluster, which appears to
govern the biosynthesis of polyunsaturated fatty acids such as
eicosapentaenoic
acid (Oliynyk et al., 2007). SACE_2875 is part of the pks3 gene
cluster involved in the production of an aromatic polyketide antibiotic
with low molecular weight. As a result, synthesis of these two kinds of
secondary metabolites should be inactivated or diminished by nonsense
mutations. The biosynthesis of erythromycin also depends on polyketide
synthases (PKS), therefore the inactivation or diminishment ofpks- related synthesis pathways may weaken precursor competitions
between the synthesis of erythromycin and other polyketides.