3.0 RNA-Seq analysis pipelines
Over the years, microarray and gene-chip technologies provide an insight
into understanding the genetic changes in biological samples. However,
these techniques are known to have certain limitations related to
dynamic range, resolution and accuracy.51 Advances in
transcriptome technology have allowed deeper understanding of the
intricacies of gene expression regulation, particularly high-throughput
RNA sequencing technology that made it possible to observe whole
transcriptome variations, discover novel splicing sites and events,
functions of noncoding RNAs as well as proving correct construction and
annotation of complex genomes.52 It also aids to
qualitatively ascertain the RNA transcripts present, RNA editing sites,
and to quantitatively know how much of the individual transcripts
expressed.53 Thus, it is paramount to overview
pipelines and workflows applied to bladder cancer RNA-Seq analyses.
A number of computational pipelines and workflows are being used for the
pre-processing of RNA-Seq data in cancer studies and other experimental
purposes.50,54,55 A typical RNA-Seq workflow consists
of seven steps; (1) pre-processing of raw data, (2) alignment of reads
to the reference (3) transcriptome reconstruction, (4) quantifications
of transcripts or genes level (5) differential expression analysis (6)
functional profiling and (7) advanced analysis (Figure
2 ).56 These stages in the RNA-Seq workflow that
includes quality control (QC) and data analysis can be done using
varieties of computational platforms or tools. For example, read counts
may be aligned using different tools such as spliced transcript
alignment to a reference (STAR) or Tophat.57,58 Then,
the aligned read counts can be obtained using either HTSeq or Rsubread
R/Bioconductor package.59,60 The advantage of Rsubread
over HTSeq is that the former is faster, requires less memory and
summarizes the read counts that are more closely related to a true
value.61,62
RNA-Seq raw data often have quality problems that can distort analytical
findings significantly and lead to incorrect
conclusions.63 For instance, the quality of raw
RNA-Seq data could be altered by residue of ribosomal RNA, degradation
of RNA and variation in read coverage.63 Hence, in
order to obtain accurate transcripts or genes measurements and proper
acquisition of information from the data, raw RNA-Seq data must be
reviewed and evaluated by quality control measures before subsequent
analyses are conducted.27,63 Presently, the most
widely and commonly used computational tools available for RNA-Seq QC
include; FASTQC and MultiQC. FASTQC processes one sample at a time,
while MultiQC can generate a single report that visualizes the output of
several samples from multiple tools thereby giving room for easy
comparison.64,65 Other important and commonly used
computational software for QC are comprises of RseQC, RNA-seQC and
RNA-QC-Chain.66-68 Although both RseQC and RNAseQC can
offer QC statistics of aligned read counts, RseQC partially relies on
the University of California Senta Crus (UCSC) Genome
Browser.67 Moreover, they are slow and unable to
provide sequence trimming and filtration of contaminants. However,
RNA-QC-Chain can remove low quality reads and contamination, in addition
to providing fast and reliable QC to produce data for downstream
analysis.63 RNA-Seq data analyses steps totally depend
on the data quality and specific aims of the study. These analyses steps
were reviewed in detail elsewhere.27,69
The system of RNA-Seq analysis employs high-computational tool
applications for the development of pipelines that orchestrate the
entire workflow and optimize usage of available computational
resources.67 The development of such analytic tools
for RNA-Seq data has expanded owing to complex nature of transcriptome
data, and thus, selecting the correct processing pipeline and
normalization strategy has a significant impact on downstream
analysis.70 This pipeline consists of multiple
independent analytical software packages, tools and platforms which
employ R and Python, Unix/Bash, Java script, Perl and C++. Being that
these software are in programmable environment; they provide flexible
manipulation of data and methods. However, they required the user to
have expertise in programming languages especially the bash language or
Unix Commands Line.71 With the growing application of
RNA-Seq in biomedical research, an integrated user friendly platforms
are needed to overcome the barriers encountered when using code-bond
platforms, the Graphical user interface(GUI) or web-based platforms
provides convenient and enabling environment for non-expert with
advantages for quick exploratory analysis, even though not on the scale
of large datasets.71 Table 1 provides a
summary of the various computational tools and their associated
platforms used in RNA-Seq analyses.
Variations in the RNA-Seq analysis results might be observed due to
usage of different platforms and analytical framework. The number of
computational tools and bioinformatics methods that are currently in
use, add more challenges to the analysis and interpretation of the
RNA-Seq data. In order to solve these challenges caused by variations in
RNA-Seq analysis techniques, standard pipelines need to be enforced and
re-designed in order to integrate analysis of multiple experiments.
Workflow constructions software packages such as
Chipster,72 Anduril73,74 and
Galaxy75 could be very much relevant in solving some
of these challenges. For example, Anduril was developed for designing
complex RNA-Seq pipelines with large-scale datasets which require
automated parallelization. While Chipster and Galaxy are powerful in
data integrative visualization which makes it very useful for data
exploration and interpretation. Other workflows and management
frameworks for RNA-Seq analysis are KNIME76 which aid
in visual assembly and interactive execution of data pipeline and
Snakemake,77 which is a Python-based workflow
management engine that provides a powerful execution environment.
Workflow management framework that specifically focuses on RNA-Seq data
analysis is reviewed by.83 In addition, the
large-scale nature of the data analyses associated with RN-Seq brought
many challenges that are beyond the scope of this review. Han and
colleagues 78 reviewed these challenges
comprehensively and proposed solutions. Moreover, results from RNA-Seq
study on tumours revealed the presence of molecular subsets of cellular
signatures, microenvironment and facilitates choices to circumvent
treatment failure.79 Thus, single-cell sequencing
(scRNA-Seq) may prove the correct method to understand tumour
progression, pathogenesis and discovery of biomarkers that could lead to
a better treatment and management of bladder cancer.