Level 3: Activity Flow
CausalPath 28
CausalPath is an explanation extraction algorithm which uses causal
relationships from Pathway Commons37 as priors to
extract a mechanistic explanation for the patterns in proteomics,
phospho-proteomics, and transcriptomics datasets. CausalPath produces
causal hypotheses about the differences between comparable datasets, for
example, biopsies from different conditions or timepoints, or the
covariance across a cohort. These explanations are presented as an
activity flow sub-network, which can also be expanded as a more detailed
process description network. The method mimics a biologist’s traditional
approach of explaining changes in data using prior knowledge, but does
this at the scale of hundreds of thousands of reactions.
CausalPath employs 12 pre-defined patterns that describe causal
relationships between biological entities in the network, for example, a
kinase phosphorylating another protein implies an expected correlation
between the kinase’s abundance or activating phosphorylation with the
phosphorylation of the target protein). Using these pre-defined
patterns, CausalPath assembles an activity flow network showing the
causal relationships supported by the proteomic, phosphoproteomic and
transcriptomic data.
CausalPath was applied to several publicly available datasets covering a
wide range of scenarios and biological questions. In a set of
time-resolved epidermal growth factor (EGF) stimulation experiments,
CausalPath detected EGFR activation via downstream signaling of MAPKs,
including feedback inhibition on EGFR. From ligand-induced and
drug-inhibited cell-line experiments, CausalPath estimated the precision
of its predictions. From CPTAC (Clinical Proteomic Tumor Analysis
Consortium) protein mass spectrometry datasets for ovarian and breast
cancer, CausalPath elucidated general and subtype-specific signaling, as
well as regulators of well-known cancer proteins. In RPPA (Reverse Phase
Protein Array) experimental datasets of 32 TCGA (Cancer Genome Atlas)
cancer studies, CausalPath found a core signaling network that is
recurrently identified across many cancer types.
CoPPNet 38CoPPNet is a phenotype prediction tool which uses level 3 networks to
accomplish unsupervised subtyping of cancer. CoPPNet first constructs a
functional network of phosphorylation sites based on their
co-phosphorylation patterns, and then identifies relevant subnetworks
that correlate to subtypes.
The method first constructs a PhosphoSite Functional Association (PSFA)
Network that models potential functional relationships between
phosphosite pairs. Edges are inferred using information from existing
databases: PTMCode is used for functional, structural and evolutionary
associations, PhosphositePLUS for kinase-substrate associations and
inferring shared-kinase pairs, and BIOGRID PPI for protein-protein
interactions. Data from MS-based phospho-proteomics assays is then
incorporated using bi-weight mi-correlation to assess co-phosphorylation
(Co-P) of phosphosite pairs connected in the PSFA network, resulting in
a weighted PSFA network. Finally, subnetworks enriched in highly
co-phosphorylated phosphosite pairs are extracted. To achieve this, the
weighted PSFA network is searched for subnetworks using a greedy
algorithm to maximize Co-P score, resulting in a list of ranked
subnetworks referred to as Co-P modules. Modules are then assessed for
statistical significance, subtype specificity, predictive ability, and
reproducibility.
CoPPNet was applied to two independent breast cancer phospho-proteomic
datasets. The phosphorylation patterns of identified Co-P modules were
found to strongly correlated with known subtypes (Luminal vs. Basal),
and Co-P modules were shown to be reproducible across datasets from
different studies.
IntOMICS 39
IntOMICS is a Bayesian framework that reconstructs gene regulatory
networks from integrated multi-omic data including; gene expression, DNA
methylation, and copy number variation data as well as prior knowledge
from KEGG (regulatory relationships) and target gene-transcription
factor associations from ENCODE. This is a network inference algorithm
for level 3 representation.
The IntOMICS framework is based on the Werhli and Husmeier (W&H)
algorithm40, which encodes each omics data source into
separate energy functions. IntOMICS integrates the omics data by
encoding the energy functions into a Gibbs distribution. Effects of
multiple upstream controllers are additive. The inverse temperature
hyperparameters for each source are tuned by sampling from the posterior
distribution with Markov chain Monte Carlo (MCMC). Unlike the original
W&H algorithm, IntOMICS uses an adaptive MCMC simulation and Markov
blanked resampling to improve the MCMC convergence speed.
For validation and comparison, the authors used IntOMICS to understand
the mechanism of chemoresistance using primary colon cancer samples from
a randomized Phase III clinical trial. Their goal was to identify
downstream mediators of ABCG2 , which has been shown to contribute
to chemoresistance. They compared the network generated from IntOMICS to
those from an unaltered implementation of the W&H algorithm as well as
two other multi-omic integration frameworks, RACER and KiMONo. IntOMICS
nominated more downstream mediators of ABCG2 , which may be
important for chemoresistance in colon cancer and survival.