Beyond estimating diversity: Exciting advances in statistics
Modelling advances in community ecology offer exciting opportunities to
understand the complex patterns in microbial diversity and complement
robust sampling designs (e.g., Grantham et al., 2020; Trego et al.,
2022). In addition, novel methods for analyzing amplicon sequencing data
are continuously emerging, primarily focused on the human gut microbiome
but adaptable to other microbial ecology fields with suitable study
designs and datasets (e.g., Trego et al., 2022). Broadly, these tools
can be categorized into quantifying community assembly processes,
mapping occurrence networks, capturing spatial/temporal dynamics,
integrating multi-omics, identifying differentially abundant taxa,
finding species-environment associations, and predicting functional
patterns (Trego et al., 2022). However, despite the frequent use of
high-throughput sequencing, there has been a slow uptake of these new
analytical techniques, and many studies do not go much beyond basic
comparisons of alpha and beta diversity estimates across samples. While
important inferences can be made by examining overall patterns of
composition and diversity (e.g., Grosser et al., 2019; Motta et al.,
2018), they offer only a starting point toward having a more mechanistic
understanding of the ecological drivers of microbiome variation (Shade,
2017).
Common analytical approaches to quantify differences in beta diversity
across microbiome samples, such as the permutational multivariate
analysis of variance (PERMANOVA), are algorithmic (i.e., not based on a
statistical model) and do not explicitly account for uncertainty in
ecological data (Björk et al., 2018; Warton et al., 2012, 2015).
Importantly, making inferences about microbiome variation is often
difficult using algorithmic distance-based approaches (Björk et al.,
2018; Warton et al., 2012). Model-based approaches such as joint species
distribution models (JSDMs) or stacked models (Powell-Romero et al.,
2023) are multi-response extensions of generalized linear mixed models
(GLMMs) that can overcome some of the limitations of the algorithmic
methods to elucidate patterns of microbiome variation (e.g., Björk et
al., 2018; Grantham et al., 2020). Often using a Bayesian framework,
JSDMs simultaneously analyze multiple species and environmental
variables, allowing for the assessment of community-level responses to
environmental change and host effects (Björk et al., 2018; Ovaskainen et
al., 2017; Pollock et al., 2014). JSDMs can i) incorporate information
on species traits and phylogenetic relatedness, improving estimation
accuracy and power when there is a phylogenetic signal (Ovaskainen et
al., 2017), and ii) analyse patterns of taxon covariance to infer
microbial co-occurrence networks (Björk et al., 2018; Fountain-Jones et
al., 2020, 2023). Microbial co-occurrence networks are valuable tools in
microbiome science, as they offer insights (but see Current gaps
and future directions below) into the associations among microbial
taxa, enhancing our understanding of microbial community dynamics and
functioning. JSDM-based co-occurrence networks have an added advantage
of interpretation as the major environmental and host effects shaping
microbial presences are controlled for (i.e., an inferred association
between microbes is then not likely a mere product of a shared
environmental response). However, GLMM-based JSDM co-occurrence networks
cannot untangle the relative roles of taxa associations, and
environmental or host effects (Clark et al., 2018; Fountain-Jones et
al., 2020) and tend to not scale well with large datasets (Pichler &
Hartig, 2021). Approaches such as conditional random fields (CRF, Clark
et al., 2018), multi-response interpretable machine learning (mrIML,
Fountain-Jones et al., 2021), MIMIX (Microbiome MIXed Model, Grantham et
al., 2020) and scalable JSDMs (sjSDM, Pichler & Hartig, 2021) can
overcome these limitations. Importantly, approaches such as MrIML and
MiMiX allow for predictions and treatment effects to be extracted for
individual taxa, which can be useful if researchers have a set of focal
taxa. We note that these methods are not appropriate in all situations.
For particularly large datasets (thousands of samples), new
distance-based methods such as D-MANOVA (Chen & Zhang, 2021) or
multivariate distance matrix regression (MDMR, Zapala & Schork, 2012)
may be better options. Boshuizen & te Beest (2023) have provided a
complete guide of the pitfalls in analysing amplicon data. While the
tools mentioned here represent only a tiny fraction of the potential
methods available, we encourage readers to go beyond diversity metrics
and differentially abundant taxa to gain more mechanistic insights into
microbiome data from wild species.
Incorporating some of the methodological advances in bioinformatics and
statistics, coupled with robust study design, and rigorous laboratory
techniques, will improve current research efforts in the field (see Fig.
1 for a summary). Moreover, taking into consideration both the
limitations and opportunities of these various approaches allow us to
open up new exciting avenues in the field of microbiome ecology
research.