Level 3: Activity Flow
CausalPath 28
CausalPath is an explanation extraction algorithm which uses causal relationships from Pathway Commons37 as priors to extract a mechanistic explanation for the patterns in proteomics, phospho-proteomics, and transcriptomics datasets. CausalPath produces causal hypotheses about the differences between comparable datasets, for example, biopsies from different conditions or timepoints, or the covariance across a cohort. These explanations are presented as an activity flow sub-network, which can also be expanded as a more detailed process description network. The method mimics a biologist’s traditional approach of explaining changes in data using prior knowledge, but does this at the scale of hundreds of thousands of reactions.
CausalPath employs 12 pre-defined patterns that describe causal relationships between biological entities in the network, for example, a kinase phosphorylating another protein implies an expected correlation between the kinase’s abundance or activating phosphorylation with the phosphorylation of the target protein). Using these pre-defined patterns, CausalPath assembles an activity flow network showing the causal relationships supported by the proteomic, phosphoproteomic and transcriptomic data.
CausalPath was applied to several publicly available datasets covering a wide range of scenarios and biological questions. In a set of time-resolved epidermal growth factor (EGF) stimulation experiments, CausalPath detected EGFR activation via downstream signaling of MAPKs, including feedback inhibition on EGFR. From ligand-induced and drug-inhibited cell-line experiments, CausalPath estimated the precision of its predictions. From CPTAC (Clinical Proteomic Tumor Analysis Consortium) protein mass spectrometry datasets for ovarian and breast cancer, CausalPath elucidated general and subtype-specific signaling, as well as regulators of well-known cancer proteins. In RPPA (Reverse Phase Protein Array) experimental datasets of 32 TCGA (Cancer Genome Atlas) cancer studies, CausalPath found a core signaling network that is recurrently identified across many cancer types.
CoPPNet 38CoPPNet is a phenotype prediction tool which uses level 3 networks to accomplish unsupervised subtyping of cancer. CoPPNet first constructs a functional network of phosphorylation sites based on their co-phosphorylation patterns, and then identifies relevant subnetworks that correlate to subtypes.
The method first constructs a PhosphoSite Functional Association (PSFA) Network that models potential functional relationships between phosphosite pairs. Edges are inferred using information from existing databases: PTMCode is used for functional, structural and evolutionary associations, PhosphositePLUS for kinase-substrate associations and inferring shared-kinase pairs, and BIOGRID PPI for protein-protein interactions. Data from MS-based phospho-proteomics assays is then incorporated using bi-weight mi-correlation to assess co-phosphorylation (Co-P) of phosphosite pairs connected in the PSFA network, resulting in a weighted PSFA network. Finally, subnetworks enriched in highly co-phosphorylated phosphosite pairs are extracted. To achieve this, the weighted PSFA network is searched for subnetworks using a greedy algorithm to maximize Co-P score, resulting in a list of ranked subnetworks referred to as Co-P modules. Modules are then assessed for statistical significance, subtype specificity, predictive ability, and reproducibility.
CoPPNet was applied to two independent breast cancer phospho-proteomic datasets. The phosphorylation patterns of identified Co-P modules were found to strongly correlated with known subtypes (Luminal vs. Basal), and Co-P modules were shown to be reproducible across datasets from different studies.
IntOMICS 39
IntOMICS is a Bayesian framework that reconstructs gene regulatory networks from integrated multi-omic data including; gene expression, DNA methylation, and copy number variation data as well as prior knowledge from KEGG (regulatory relationships) and target gene-transcription factor associations from ENCODE. This is a network inference algorithm for level 3 representation.
The IntOMICS framework is based on the Werhli and Husmeier (W&H) algorithm40, which encodes each omics data source into separate energy functions. IntOMICS integrates the omics data by encoding the energy functions into a Gibbs distribution. Effects of multiple upstream controllers are additive. The inverse temperature hyperparameters for each source are tuned by sampling from the posterior distribution with Markov chain Monte Carlo (MCMC). Unlike the original W&H algorithm, IntOMICS uses an adaptive MCMC simulation and Markov blanked resampling to improve the MCMC convergence speed.
For validation and comparison, the authors used IntOMICS to understand the mechanism of chemoresistance using primary colon cancer samples from a randomized Phase III clinical trial. Their goal was to identify downstream mediators of ABCG2 , which has been shown to contribute to chemoresistance. They compared the network generated from IntOMICS to those from an unaltered implementation of the W&H algorithm as well as two other multi-omic integration frameworks, RACER and KiMONo. IntOMICS nominated more downstream mediators of ABCG2 , which may be important for chemoresistance in colon cancer and survival.