Utility of Networks
Omics profiles offer a molecular snapshot of a biological system under a set of conditions16. Molecular structures commonly profiled by omics techniques include the genome (genomics), RNA (transcriptomics), proteins and their post translational modifications (proteomics), metabolites (metabolomics), and the epigenome (epigenomics)17. Some modalities can even be profiled at the level of a single cell, giving much deeper resolution. These technologies and the interpretation of their results with network-based methods are a driving force in the field of systems biology. Using networks, we can generate conjectures about the patterns in these highly complex datasets and understand which observed relationships can be explained by existing knowledge, and which relationships point to novel findings.
This “explainability” is key for the iterative process of scientific discovery. If an algorithm can make somewhat accurate predictions about the behavior of a system but cannot point to the components that are likely to drive the observed behavior, then the predictions can only be tested phenomenologically and not mechanistically. This is a limiting and inefficient way of analyzing a complex combinatorial system. There is no better example for this claim than commercial drug discovery, which relies on very large phenomenological screens for clinical trials. Despite substantial efficiency gains, between 1950 and 2010 the costs of research and development per approved drug approximately doubled every nine years18 as we try to tackle increasingly complex diseases.
A related benefit of a grounded, mechanistic inference is the ability to “reason” about the system’s response to a previously unknown perturbation such as a new drug combination or a mutation. This is an extension of a biologist’s intuition - e.g., inhibiting the inhibitor of a target protein will activate it - but can be done at-scale. Additionally, it enables us to identify the reasons behind the failure of our predictions.
A perhaps less appreciated aspect of network-based approaches is the use of networks as prior information to restrict the search space of statistical algorithms. When evaluating potential network models in the context of their ability to explain or fit a certain data de novo , the number of possible network models grows exponentially,\(O(2^{n^{2}})\), as a function of the number of nodes19. This leads to substantial problems with model overfitting, multiple hypothesis testing correction and model degeneracy. Multiple hybrid methods were developed that use prior information probabilistically along with de novo inference to center the inferred/evaluated models around known biology that can restrict the search space substantially12.
The combined effect of these advantages is an incremental, iterative discovery process that can be done at-scale. This is crucial, given the rapid evolution of omics technologies and the ever-increasing volume of omics data.