Notes. N: Sample size; SD: Standard Deviation.
Next, using Equation 1, we standardise the data using the pooled SD of
the outcome among participants at the post-treatment time point. First,
we calculate both pooled sample SDs (baseline and post-treatment)
\begin{equation}
\text{SD}_{\text{pooled}}=\ \sqrt{\frac{{\text{SD}_{t}}^{2}*\left(n_{t}-1\right)+\ {\text{SD}_{c}}^{2}*\left(n_{c}-1\right)\ }{n_{t}+\ n_{c}-2}}\nonumber \\
\end{equation}Equation 2. Calculation of a pooled sample SD. The ‘t’ suffix indicates
treatment, and the ‘c’ suffix refers to control arm.
Pooled sample SD at baseline time point:
\begin{equation}
\text{SD}_{\text{pooled}}=\ \sqrt{\frac{{2.5}^{2}*\left(143-1\right)+\ {3.1}^{2}*\left(125-1\right)\ }{143+\ 125-2}}=2.796\nonumber \\
\end{equation}Pooled sample SD at post-treatment time point:
\begin{equation}
\text{SD}_{\text{pooled}}=\ \sqrt{\frac{{2.5}^{2}*\left(125-1\right)+\ {2.9}^{2}*\left(125-1\right)\ }{125+\ 125-2}}=2.707\nonumber \\
\end{equation}Next, we convert arm-based data into contrast-based data (i.e., a single
effect measure that summarises the MD between the two study-arms) using
Equations 3 and 4.
\begin{equation}
MD=\ \text{Mean\ score}_{\text{treatment}}-\text{Mean\ score}_{\text{control}}\nonumber \\
\end{equation}Equation 3. MD computation at post-treatment time point.
\begin{equation}
\text{SE}_{\text{MD}}=\ \sqrt{\frac{n_{t}+n_{c}\ }{n_{t}*n_{c}}*\frac{{\text{SD}_{t}}^{2}*\left(n_{t}-1\right)+\ {\text{SD}_{c}}^{2}*\left(n_{c}-1\right)\ }{n_{t}+n_{c}-2}}\nonumber \\
\end{equation}Equation 4. Computation of the SE of the MD at post-treatment time
point.
\begin{equation}
MD=\ 3.2-3.8=\ -0.6\nonumber \\
\end{equation}\begin{equation}
\text{SE}_{\text{MD}}=\sqrt{\frac{125+125\ }{125*125}*\frac{{2.5}^{2}*\left(125-1\right)+\ {2.9}^{2}*\left(125-1\right)\ }{125+\ 125-2}}=0.342\nonumber \\
\end{equation}Then, we standardise our MD and SE dividing them by the corresponding
pooled sample SDs. Methodologists support the use of pooled SDs at
baseline over follow-up SDs, but it is common that studies only report
follow-up data. Therefore, we are going to standardise data in both
cases: (1) supposing that we have baseline data; and (2) supposing that
we only have follow-up data.
Standardised data (SMD and SE) using the pooled sample SD at baseline
time point.
\begin{equation}
SMD=\ \frac{\text{MD}}{\text{SD}_{\text{pooled}}}=\frac{-0.6}{2.796}=-0.215\ \nonumber \\
\end{equation}\begin{equation}
\text{SE}_{\text{SMD}}=\ \frac{\text{SE}}{\text{SD}_{\text{pooled}}}=\frac{0.342}{2.796}=0.122\ \nonumber \\
\end{equation}Standardised data (SMD and SE) using the pooled sample SD at
post-treatment time point.
\begin{equation}
SMD=\ \frac{\text{MD}}{\text{SD}_{\text{pooled}}}=\frac{-0.6}{2.707}=-0.222\ \nonumber \\
\end{equation}\begin{equation}
\text{SE}_{\text{SMD}}=\ \frac{\text{SE}}{\text{SD}_{\text{pooled}}}=\frac{0.342}{2.707}=0.126\ \nonumber \\
\end{equation}Although this method is the most common applied in meta-analyses, the
use of a fixed scale-specific SD reference is recommended [6,7]. A
more-in-depth explanation of this method can be found in Gallardo-Gómez
et al. (2023) [3] and the online content.
HOW TO INTERPRET STANDARDISED MEAN DIFFERENCES
The SMDs express the size of the treatment effect in each study relative
to the variability observed in that study. However, the overall
treatment effect could be difficult to interpret as it is reported in
units of standard deviation rather than the original units of
measurement. Without guidance, clinicians and patients may have little
idea how to interpret results presented as SMDs. There are two
possibilities for re-expressing such results in more helpful
ways:
Re-expressing SMDs using rules of thumb for effect sizes. One
example based on Cohen (1998) [8] is as follows: 0.2 represents a
small effect; 0.5 a moderate effect: and 0.8 a large effect.
Nevertheless, some methodologists believe that such interpretations
are problematic because the importance of a finding is
context-dependent and not amenable to generic statements [7].
Re-expressing SMDs using a familiar instrument . The second
(and recommended) option is to re-express the SMD in the units of one
or more of the specific measurement instruments. This method could be
performed by multiplying the SMD by a typical among-person SD for a
particular scale (e.g., an external SD reference from a large cohort
or cross-sectional study that matches the target population, an
internal SD reference, or a pooled sample SD), preferably, the same
used for data standardisation [3]. In this way, using the original
scale-specific units, the clinical relevance and impact of a pooled
treatment effect can be interpreted more easily. In our example, when
authors pooled all effect sizes, they obtained a pooled treatment
effect of SMD = 0.40 (95% Confidence Interval 0.02 to 0.77). We then
re-express this effect size into SPPB units multiplying by the
external SD reference for the study population (external SD reference
= 3.14), obtaining a scale-specific pooled effect of MD = 0.97 (95%
CI 0.06 to 2.42). Considering a predefined minimally clinically
important difference of 1 point in the SPPB [9], we could support
the use of an intervention (physical activity in this case [4]) in
a specific population due to its clinically meaningful benefit
in the outcome of interest.
COMMON PITFALLS USING STANDARDISED MEAN DIFFERENCES
- Unnecessary data standardisation . Reviewers do not need to
standardise their data when there are not different scales assessing
the outcome of interest. The belief that the term ‘effect size’ is a
synonym of ‘SMD’ can lead to authors reporting the treatment effect in
SMD units when it is not needed. One example of this is when only one
study is reported in a forest plot; an SMD is not needed, and this
should be reported as MD.
- Use of SEs rather than SDs to calculate SMDs . As we have seen
in the Equation 1, we use the post-treatment pooled sample SD to
calculate SMDs. Nonetheless, primary studies could wrongly report the
SE of an assessment as the SD or not specify whether they are
reporting SD or SE. A red flag for this could be a quite low SD(i.e., <1), though it is highly dependent on the score range
of the specific scale. This mistake could lead to ‘effect size
inflation’ because when you use SEs to calculate SMDs, you are
dividing the MD by a lower value of the truly corresponding one,
obtaining a higher value. Therefore, if you obtain SMDs greater than
one, you should check whether the SD or SE has been used.
- Combination of change from baseline and post-treatment effect
measures . Although mixing change from baseline and post-treatment
outcomes is not a problem when it comes to meta-analysis of MDs
[7], they should not in principle be combined using SMDs. This is
because the SDs used in standardising post-treatment values reflect
between-person variability at a single time point, where SDs used in
change scores standardization reflect variation in between-person
changes over time, so will depend on both within-person (dependent on
the length of time between measurements) and between-person
variability [7].
- Effect size direction . There are scales where an improvement
in the outcome is reflected by a reduction in the score (e.g., in our
illustrative example, the less time spent in walking a distance, the
better functional capacity). In addition, to interpret the magnitude
of an effect, we must consider the specific outcome (e.g., a more
negative effect could be positive if the review is investigating
depressive symptoms, meaning a reduction in these symptoms). To
correct an effect that is not in line with the direction of our
meta-analysis, we should multiply the effect size value by –1, (no
modifications are needed for the SD), ensuring that all effects are in
the same direction.
- No interpretation of SMDs . A huge number of meta-analyses
often leave their effect estimates as SMDs, which can make
interpretation difficult. We have talked about different available
options to re-express SMDs to more-interpretable estimates above.