Figure 2. The 5 levels of network models. Scope refers generally to the size of networks and the volume of interactions recorded at that level. Mechanistic detail refers to whether the stepwise processes of a reaction are explicitly given in the network model. Causality refers to whether the network model can be used to make causal inferences that can be statistically interrogated.
Gene Sets , as the name implies, are curated lists of genes grouped by association with a particular phenotypic outcome, molecular pathway, or cellular event. Gene sets, although not networks per se, are often derived from network representations, such as boundaries of KEGG pathways. Pathway boundaries are fiatboundaries24, induced primarily through human demarcation. For example, despite covering the same biological processes, KEGG pathways contain 4 times more entities on average compared to BioCyc25 pathways, primarily due to differences in curation guidelines. They also provide substantially different results when these fiat boundaries are used as input for gene set enrichment tasks 26. Although they encompass well described biological mechanisms, gene sets do not contain mechanistic detail in the form of directed and/or signed edges. Approaches at this level typically perform explanation/extraction. Typically, this involves testing for statistical enrichment of gene sets or their components to propose explanations for observed cellular behavior, e.g. highlighting the most dramatically enriched pathway in a cancer biopsy to determine possible therapeutic targets. This can also be extended to a phenotype prediction task if the gene set describes a particular phenotype, e.g. a gene set composed of markers for epithelial to mesenchymal transition in breast cancer cells.
Interaction Networks represent interactions between biological entities by unsigned, undirected edges. These edges don’t contain any cause/effect semantics and therefore can’t be used to make causal predictions. These simple interactions can be detected in large quantities by through high-throughput methods, hence there are millions of interactions present in existing data sources, an order of magnitude more than subsequent levels. Additionally, interaction networks are simple to align and integrate with one another, as each entity is typically represented by only one node in the graph. They are commonly used as a starting point in untargeted high throughput assays where quantitative measurements are recorded for many entities and the researcher wants to look broadly at their data without necessarily seeking causal explanations.
Activity Flow networks, like interaction networks, typically contain one node for a given entity, allowing for easy integration of multiple networks so long as naming conventions for entities are consistent. In contrast, activity flow networks add a layer of cause/effect semantics in the form of directed and, sometimes, signed edges. For this reason, activity flow networks can be used for making causal predictions, and while these networks are considerably smaller than level 2, they are expansive enough that they can still be used for interrogating untargeted high-throughput datasets.
Process Description networks illustrate the mechanistic detail of how a reaction occurs. Because these models describe the stepwise events in a reaction, it is not uncommon that one edge could be informed by multiple sources, making them very well grounded in the literature. They are considerably smaller given that most, if not all, of their curation must be done by hand. Unlike prior levels, these diagrams represent the same entity with multiple nodes, corresponding to each of that entity’s states through a sequence of events, including covalent modifications, cellular/subcellular locations, and/or complex memberships. This makes the integration of multiple process description networks a considerably more intensive exercise relative to levels 2 and 3.
Quantitative Models were originally derived from canonical chemical equations. These models are like process description networks in that these representations explicitly model the stepwise process of a reaction, but they are expanded to include quantitative factors like concentrations, stoichiometry, and rate constants. An example of a quantitative model would be a metabolic pathway represented as a bipartite graph of substrates, products, catalysts, and reactants. They are often used to describe systems which are very intensively studied and are typically very small compared to the preceding levels, due to the volume of research required to inform their curation.
Some networks and models fall into two consecutive categories. For example, the networks used in PhosphositePlus27 and CausalPath28 are represented as activity flow networks, however both describe posttranslational modifications, which lends to the mechanistic detail in a process description network. Molecular Interaction Maps (MIMs)29 are equivalent in semantic detail to process description but retain an activity flow-like visualization. Finally, some large process description databases curate quantitative values such as enzymatic constants to allow for construction of quantitative models30.