2. Model development

2.1. Pan-genome formulation

Twenty-four completely assembled SRP genome sequences, including 5 archaea of the genus Archaeoglobus , 15 Gram-negative bacteria of the genus Desulfovibrio , and 4 Gram-positive bacteria of the genus Desulfotomaculum (Table 1), were extracted from the NCBI website (Brister et al., 2014). The genomic features, gene contents, and accession numbers of the genomic sequences of the SRPs are summarized in Table 1. Formulation of the pan-genome for each genus relies on iterative pairwise comparison of the selected genomes within the genus to identify groups of orthologous genes (OGs) that encode proteins with the same function. To improve the reliability of computation, orthologous groups were identified by a combination of two platforms: i) the EDGAR software platform which employs the BLAST Score Ratio Values (SRVs) with the orthology cutoff calculated from the analyzed genome set rather than using a fixed threshold (Blom et al., 2016), and ii) the pan-genome analysis pipeline BPGA with the default 50% sequence identity as the cutoff value for ortholog clustering (Chaudhari et al., 2016). The final list of OGs (Table S1) comprised the OGs identified by both platforms. OGs predicted by either but not both of the platforms were manually inspected via BLASTP (protein-protein) search (Camacho et al., 2009). A phylogenetic tree to illustrate the genetic relationships between the 24 SRP genomes was constructed using the FastTree tool (M. N. Price et al., 2010). Functional categories of genes within three different pan-genome subsets, i.e., core, accessory, and unique genomes, were predicted and classified using the Clusters of Orthologous Groups of proteins (COG) database (Tatusov et al., 2000). COG annotation was conducted on the WebMGA server, using the default e-value cutoff of 0.001 for prediction (Wu et al., 2011).

2.2. Construction of multiple metabolic models

The concept of core metabolic model was adopted in this study to focus on energy metabolism and improve the efficiency of model construction. The model construction workflow began with creating a core model template (Figure 1). The SRP core model template consisted of glycolysis, tricarboxylic acid cycle (TCA) cycle, pentose phosphate pathway, fermentation, DSR, and various electron transport chain (ETC) pathways. Core reactions with associated functional proteins in the model template were derived from Edirisinghe et al. (2016) and J. D. Orth et al. (2010a) in the context of pan-genome analysis. In addition, DSR and ETC pathways were included through literature mining and curation.
The ortholog table obtained from pan-genome analysis, which consisted of orthologous genes associated with functional proteins, was compared against the SRP core model template, which encompassed functional proteins associated with biochemical reactions. This pairwise comparison was to ascertain the presence or absence of specific biochemical reactions and pathways, resulting in a set of GPR associations. Once the GPR associations were determined, the related biochemistry data was propagated to construct a draft metabolic model. An objective function (OF) of biomass biosynthesis, which was optimized during FBA to predict flux profiles (Schuetz et al., 2007), was obtained from Edirisinghe et al. (2016) and added to each draft model. The draft model was further imported into the Cobra toolbox in MATLAB (Heirendt et al., 2017) to conduct minimum gap-filling. A total of 24 metabolic models of SRPs were constructed, among which, DvH is chosen as the model organism for model validation, because it is well studied and encompasses versatile ETC pathways that are shared by most SRP species (I. A. C. Pereira et al., 2011). Both growth associated maintenance (GAM) and non-growth associated maintenance (NGAM) are added to the DvH model. NGAM quantifies the energy required by SRPs to maintain themselves in a given environment while GAM quantifies growth energy requirements not included in the metabolic model (Kavvas et al., 2018). Literature data obtained from chemostat experiments are used for GAM and NGAM calculation (Badziong & Thauer, 1978; Traore et al., 1981).

2.3 Model simulations by flux balance analysis

To make rigorous and quantitative growth predictions, metabolic models were transferred into stoichiometric matrix (S ), in whichSij represented the stoichiometric coefficient of metabolite i in reaction j . The vector of reaction fluxes (reaction rates) is represented by a vector v . FBA was used to optimize specific OFs such as biomass or targeted metabolites under steady-state criteria (S·v = 0), thereby making it possible to predict the growth rate of an organism or the rate of production of a biotechnologically important metabolite (Oberhardt et al., 2009; Jeffrey D Orth et al., 2010b; N. D. Price et al., 2004). In this study, simulations are performed by assuming maximal biomass production to be the OF of SRPs. Limiting substrate uptake rates were adopted from literature data (Badziong & Thauer, 1978; Traore et al., 1981). Predicted flux of the simulation could be visualized in an Escher map (King et al., 2015).