2. Model development
2.1. Pan-genome formulation
Twenty-four completely assembled SRP genome sequences, including 5
archaea of the genus Archaeoglobus , 15 Gram-negative bacteria of
the genus Desulfovibrio , and 4 Gram-positive bacteria of the
genus Desulfotomaculum (Table 1), were extracted from the NCBI
website (Brister et al., 2014). The genomic features, gene contents, and
accession numbers of the genomic sequences of the SRPs are summarized in
Table 1. Formulation of the pan-genome for each genus relies on
iterative pairwise comparison of the selected genomes within the genus
to identify groups of orthologous genes (OGs) that encode proteins with
the same function. To improve the reliability of computation,
orthologous groups were identified by a combination of two platforms: i)
the EDGAR software platform which employs the BLAST Score Ratio Values
(SRVs) with
the
orthology cutoff calculated from
the analyzed genome set rather than using a fixed threshold (Blom et
al., 2016), and ii) the pan-genome analysis pipeline BPGA with the
default 50% sequence identity as the cutoff value for ortholog
clustering (Chaudhari et al., 2016). The final list of OGs (Table S1)
comprised the OGs identified by both platforms. OGs predicted by either
but not both of the platforms were manually inspected via BLASTP
(protein-protein) search (Camacho et al., 2009). A phylogenetic tree to
illustrate the genetic relationships between the 24 SRP genomes was
constructed using the FastTree tool (M. N. Price et al., 2010).
Functional categories of genes within three different pan-genome
subsets, i.e., core, accessory, and unique genomes, were predicted and
classified using the Clusters of Orthologous Groups of proteins (COG)
database (Tatusov et al., 2000). COG annotation was conducted on the
WebMGA server, using the default e-value cutoff of 0.001 for prediction
(Wu et al., 2011).
2.2. Construction of multiple metabolic models
The concept of core metabolic model was adopted in this study to focus
on energy metabolism and improve the efficiency of model construction.
The model construction workflow began with creating a core model
template (Figure 1). The SRP core model template consisted of
glycolysis, tricarboxylic acid cycle (TCA) cycle, pentose phosphate
pathway, fermentation, DSR, and various electron transport chain (ETC)
pathways. Core reactions with associated functional proteins in the
model template were derived from Edirisinghe et al. (2016) and J. D.
Orth et al. (2010a) in the context of pan-genome analysis. In addition,
DSR and ETC pathways were included through literature mining and
curation.
The ortholog table obtained from
pan-genome analysis, which consisted of orthologous genes associated
with functional proteins, was compared against the SRP core model
template, which encompassed functional proteins associated with
biochemical reactions. This pairwise comparison was to ascertain the
presence or absence of specific biochemical reactions and pathways,
resulting in a set of GPR associations. Once the GPR associations were
determined, the related biochemistry data was propagated to construct a
draft metabolic model. An objective function (OF) of biomass
biosynthesis, which was optimized during FBA to predict flux profiles
(Schuetz et al., 2007), was obtained from Edirisinghe et al. (2016) and
added to each draft model. The draft model was further imported into the
Cobra toolbox in MATLAB (Heirendt et al., 2017) to conduct minimum
gap-filling. A total of 24 metabolic models of SRPs were constructed,
among which, DvH is chosen as the model organism for model validation,
because it is well studied and encompasses versatile ETC pathways that
are shared by most SRP species (I. A. C. Pereira et al., 2011). Both
growth associated maintenance (GAM) and non-growth associated
maintenance (NGAM) are added to the DvH model. NGAM quantifies the
energy required by SRPs to maintain themselves in a given environment
while GAM quantifies growth energy requirements not included in the
metabolic model (Kavvas et al., 2018). Literature data obtained from
chemostat experiments are used for GAM and NGAM calculation (Badziong &
Thauer, 1978; Traore et al., 1981).
2.3 Model simulations by flux balance
analysis
To make rigorous and quantitative growth predictions, metabolic models
were transferred into stoichiometric matrix (S ), in whichSij represented the stoichiometric coefficient of
metabolite i in reaction j . The vector of reaction fluxes
(reaction rates) is represented by a vector v . FBA was used to
optimize specific OFs such as biomass or targeted metabolites under
steady-state criteria (S·v = 0), thereby making it possible to
predict the growth rate of an organism or the rate of production of a
biotechnologically important metabolite (Oberhardt et al., 2009; Jeffrey
D Orth et al., 2010b; N. D. Price et al., 2004). In this study,
simulations are performed by assuming maximal biomass production to be
the OF of SRPs. Limiting substrate uptake rates were adopted from
literature data (Badziong & Thauer, 1978; Traore et al., 1981).
Predicted flux of the simulation could be visualized in an Escher map
(King et al., 2015).