Introduction
Francois Jacob concluded his Nobel lecture in 1965 (awarded for modeling Lac Operon with Monod and Lwoff) with a vision: “We do not know how molecules find each other, recognize each other, and combine to constitute the regulatory network … What is clear, however, is that the problems to be solved by cellular biology and genetics in the years to come tend increasingly to merge with those in which biochemistry and physical chemistry are involved.”. The idea of a network model, where genes and gene products are linked by molecular processes, was present from the very first days of molecular biology. Due to the sheer complexity of biological systems, biologists have traditionally employed reductionist approaches where different fragments of cellular processes are isolated and identified. An implicit goal of this approach has been to assemble a network model in a piecemeal fashion from these reductionist findings, which will eventually be able to explain and predict the behavior of the biological system at large.
More than half a century later, tens of millions such reductionist findings have accumulated in the literature. Multiple databases and information systems have been developed to capture the pathway information accumulated in scientific literature and present it in computable format1. Millions of interactions, molecular processes and relationships are curated as networks, including metabolic pathways, signaling pathways, gene regulatory networks, molecular interaction networks, and genetic-interaction networks.
In parallel, our ability to systematically profile cellular processes has grown with the development of modern omics technologies. We now have a range of genomic, transcriptomic, metabolomic, and proteomic techniques at our disposal. We can deeply profile a cellular system in a given context, with an increasing ability to do so spatially and at the level of single cells. These technologies allow us to generate system-scale profiles without necessarily starting with a specific hypothesis or isolating a specific component—challenging the traditional piecemeal method. In most cases, the data-driven approaches no longer seek explicit biological grounding of their findings—clusters, subtypes and signatures replace mechanisms and pathways. The perceived incompatibility between “hypothesis driven, reductionist” and “data-driven, system-scale” camps led to one of the most polarizing epistemological debates in modern molecular biology2–4.
Is this truly a fundamental divide—maybe we can have our cake and eat it too? To bridge this gap, we need to computationally combine these prior information fragments with -omics profiles to generate and test mechanistic, falsifiable conjectures at scale. Over the last two decades, thousands of algorithms and methods have been created in the field of network biology to address various sub-problems of this grand challenge, using diverse types of -omic data and prior knowledge. Given multiple data modalities, prior information sources, and tasks, it is often difficult to assess which algorithms are good for which biological questions and how they are related to each other. Here we present a framework to organize these methodologies into broad categories based on their use of prior information and the computational task they target. We also review a few examples from each category. Our goal in this review is to give readers a foundational understanding of the different types of networks, and a mental map to help match their needs with the available tools and algorithms.