Figure 2. The 5 levels of network models. Scope refers
generally to the size of networks and the volume of interactions
recorded at that level. Mechanistic detail refers to whether the
stepwise processes of a reaction are explicitly given in the network
model. Causality refers to whether the network model can be used to make
causal inferences that can be statistically interrogated.
Gene Sets , as the name implies, are curated lists of genes
grouped by association with a particular phenotypic outcome, molecular
pathway, or cellular event. Gene sets, although not networks per se, are
often derived from network representations, such as boundaries of KEGG
pathways. Pathway boundaries are fiatboundaries24, induced primarily through human
demarcation. For example, despite covering the same biological
processes, KEGG pathways contain 4 times more entities on average
compared to BioCyc25 pathways, primarily due to
differences in curation guidelines. They also provide substantially
different results when these fiat boundaries are used as input for gene
set enrichment tasks 26. Although they encompass well
described biological mechanisms, gene sets do not contain mechanistic
detail in the form of directed and/or signed edges. Approaches at this
level typically perform explanation/extraction. Typically, this involves
testing for statistical enrichment of gene sets or their components to
propose explanations for observed cellular behavior, e.g. highlighting
the most dramatically enriched pathway in a cancer biopsy to determine
possible therapeutic targets. This can also be extended to a phenotype
prediction task if the gene set describes a particular phenotype, e.g. a
gene set composed of markers for epithelial to mesenchymal transition in
breast cancer cells.
Interaction Networks represent interactions between
biological entities by unsigned, undirected edges. These edges don’t
contain any cause/effect semantics and therefore can’t be used to make
causal predictions. These simple interactions can be detected in large
quantities by through high-throughput methods, hence there are millions
of interactions present in existing data sources, an order of magnitude
more than subsequent levels. Additionally, interaction networks are
simple to align and integrate with one another, as each entity is
typically represented by only one node in the graph. They are commonly
used as a starting point in untargeted high throughput assays where
quantitative measurements are recorded for many entities and the
researcher wants to look broadly at their data without necessarily
seeking causal explanations.
Activity Flow networks, like interaction networks,
typically contain one node for a given entity, allowing for easy
integration of multiple networks so long as naming conventions for
entities are consistent. In contrast, activity flow networks add a layer
of cause/effect semantics in the form of directed and, sometimes, signed
edges. For this reason, activity flow networks can be used for making
causal predictions, and while these networks are considerably smaller
than level 2, they are expansive enough that they can still be used for
interrogating untargeted high-throughput datasets.
Process Description networks illustrate the mechanistic
detail of how a reaction occurs. Because these models describe the
stepwise events in a reaction, it is not uncommon that one edge could be
informed by multiple sources, making them very well grounded in the
literature. They are considerably smaller given that most, if not all,
of their curation must be done by hand. Unlike prior levels, these
diagrams represent the same entity with multiple nodes, corresponding to
each of that entity’s states through a sequence of events, including
covalent modifications, cellular/subcellular locations, and/or complex
memberships. This makes the integration of multiple process description
networks a considerably more intensive exercise relative to levels 2 and
3.
Quantitative Models were originally derived from canonical
chemical equations. These models are like process description networks
in that these representations explicitly model the stepwise process of a
reaction, but they are expanded to include quantitative factors like
concentrations, stoichiometry, and rate constants. An example of a
quantitative model would be a metabolic pathway represented as a
bipartite graph of substrates, products, catalysts, and reactants. They
are often used to describe systems which are very intensively studied
and are typically very small compared to the preceding levels, due to
the volume of research required to inform their curation.
Some networks and models fall into two consecutive categories. For
example, the networks used in PhosphositePlus27 and
CausalPath28 are represented as activity flow
networks, however both describe posttranslational modifications, which
lends to the mechanistic detail in a process description network.
Molecular Interaction Maps (MIMs)29 are equivalent in
semantic detail to process description but retain an activity flow-like
visualization. Finally, some large process description databases curate
quantitative values such as enzymatic constants to allow for
construction of quantitative models30.