Introduction
Francois Jacob concluded his Nobel lecture in 1965 (awarded for modeling
Lac Operon with Monod and Lwoff) with a vision: “We do not know how
molecules find each other, recognize each other, and combine to
constitute the regulatory network … What is clear, however, is that
the problems to be solved by cellular biology and genetics in the years
to come tend increasingly to merge with those in which biochemistry and
physical chemistry are involved.”. The idea of a network model, where
genes and gene products are linked by molecular processes, was present
from the very first days of molecular biology. Due to the sheer
complexity of biological systems, biologists have traditionally employed
reductionist approaches where different fragments of cellular processes
are isolated and identified. An implicit goal of this approach has been
to assemble a network model in a piecemeal fashion from these
reductionist findings, which will eventually be able to explain and
predict the behavior of the biological system at large.
More than half a century later, tens of millions such reductionist
findings have accumulated in the literature. Multiple databases and
information systems have been developed to capture the pathway
information accumulated in scientific literature and present it in
computable format1. Millions of interactions,
molecular processes and relationships are curated as networks, including
metabolic pathways, signaling pathways, gene regulatory networks,
molecular interaction networks, and genetic-interaction networks.
In parallel, our ability to systematically profile cellular processes
has grown with the development of modern omics technologies. We now have
a range of genomic, transcriptomic, metabolomic, and proteomic
techniques at our disposal. We can deeply profile a cellular system
in a given context, with an
increasing ability to do so spatially and at the level of single cells.
These technologies allow us to generate system-scale profiles without
necessarily starting with a specific hypothesis or isolating a specific
component—challenging the traditional piecemeal method. In most cases,
the data-driven approaches no longer seek explicit biological grounding
of their findings—clusters, subtypes and signatures replace mechanisms
and pathways. The perceived incompatibility between “hypothesis driven,
reductionist” and “data-driven, system-scale” camps led to one of the
most polarizing epistemological debates in modern molecular biology2–4.
Is this truly a fundamental divide—maybe we can have our cake and eat
it too? To bridge this gap, we need to computationally combine these
prior information fragments with -omics profiles to generate and test
mechanistic, falsifiable conjectures at scale. Over the last two
decades, thousands of algorithms and methods have been created in the
field of network biology to address various sub-problems of this grand
challenge, using diverse types of -omic data and prior knowledge. Given
multiple data modalities, prior information sources, and tasks, it is
often difficult to assess which algorithms are good for which biological
questions and how they are related to each other. Here we present a
framework to organize these methodologies into broad categories based on
their use of prior information and the computational task they target.
We also review a few examples from each category. Our goal in this
review is to give readers a foundational understanding of the different
types of networks, and a mental map to help match their needs with the
available tools and algorithms.