Utility of Networks
Omics profiles offer a molecular snapshot of a biological system under a
set of conditions16. Molecular structures commonly
profiled by omics techniques include the genome (genomics), RNA
(transcriptomics), proteins and their post translational modifications
(proteomics), metabolites (metabolomics), and the epigenome
(epigenomics)17. Some modalities can even be profiled
at the level of a single cell, giving much deeper resolution. These
technologies and the interpretation of their results with network-based
methods are a driving force in the field of systems biology. Using
networks, we can generate conjectures about the patterns in these highly
complex datasets and understand which observed relationships can be
explained by existing knowledge, and which relationships point to novel
findings.
This “explainability” is key for the iterative process of
scientific discovery. If an algorithm can make somewhat accurate
predictions about the behavior of a system but cannot point to the
components that are likely to drive the observed behavior, then the
predictions can only be tested phenomenologically and not
mechanistically. This is a limiting and inefficient way of analyzing a
complex combinatorial system. There is no better example for this claim
than commercial drug discovery, which relies on very large
phenomenological screens for clinical trials. Despite substantial
efficiency gains, between 1950 and 2010 the costs of research and
development per approved drug approximately doubled every nine
years18 as we try to tackle increasingly complex
diseases.
A related benefit of a grounded, mechanistic inference is the ability to
“reason” about the system’s response to a previously unknown
perturbation such as a new drug combination or a mutation. This is an
extension of a biologist’s intuition - e.g., inhibiting the inhibitor of
a target protein will activate it - but can be done at-scale.
Additionally, it enables us to identify the reasons behind the failure
of our predictions.
A perhaps less appreciated aspect of network-based approaches is the use
of networks as prior information to restrict the search space of
statistical algorithms. When evaluating potential network models in the
context of their ability to explain or fit a certain data de
novo , the number of possible network models grows exponentially,\(O(2^{n^{2}})\), as a function of the number of
nodes19. This leads to substantial problems with model
overfitting, multiple hypothesis testing correction and model
degeneracy. Multiple hybrid methods were developed that use prior
information probabilistically along with de novo inference to
center the inferred/evaluated models around known biology that can
restrict the search space substantially12.
The combined effect of these advantages is an incremental, iterative
discovery process that can be done at-scale. This is crucial,
given the rapid evolution of omics technologies and the ever-increasing
volume of omics data.