Jack Wilkinson1
1 Centre for Biostatistics, Manchester Academic Health Science Centre, Division of Population Health, Health Services Research, and Primary Care.
Water, water everywhere, but not a drop to drink. Ewington et al. [BJOG CURRENT ISSUE] have undertaken a systematic review of prediction models for fetal macrosomia and large for gestational age. The authors identified 111 models, described in 58 studies. This finding alone should give us pause. We might ask whether it is a good use of resources to have so many models developed for the same purpose, or if instead it might have been preferable for most of this time, money and effort to be directed elsewhere. Redundancy is not the only cause for alarm however. The review authors note that, of the 111 models identified, none were ready for clinical implementation. To date, this massive effort has failed to benefit a single patient.
How can so much effort amount to so little? The authors critically appraised the included studies using the PROBAST tool (Wolff, et al., 2019, 170(1):51-58), judging only 5 of 58 studies to be at low risk of bias. This suggests that, while a huge amount of work has been undertaken in the development of these models, regrettably, most has not been done proficiently. The authors drew attention to inadequate methods of analysis as a recurring limitation, which could include flaws in sample size determination, predictor selection, representation of predictors in the modelling process, handling missing data, and measuring model performance (for example, failing to consider model calibration). Even the models which were at low risk of bias were not suitable for implementation. Two relied on predictors which are not routinely measured, rendering them impracticable, while the remainder had not yet had their performance assessed in a separate dataset, which is an essential step in the validation of a model.
Massive waste in prediction research appears to be the status quo across medicine. To some extent, this might be attributable to naiveté of researchers, who may be unfamiliar with methodological standards for the development and evaluation of prognostic models and perhaps also with the fact that a model developed using flawed methods might do more harm than good. Prognostic research should not be undertaken without sufficient methodological expertise. Another reason might be that much research is done not for patient benefit so much as it is for professional benefit. This provides an incentive to throw another model onto an ever-expanding pile in order to add another line to one’s CV. The fact that it is straightforward to develop a poor, albeit publishable model exacerbates the issue; all one requires is a dataset, some statistical software, and a gung-ho attitude. It would typically be preferrable to consider whether potentially suitable models already exist, and to undertake external validation studies. Systematic reviews play an important role here, by detailing everything that has been done to date. To illustrate, the present review identified two models developed using sound methodology, which could now be subjected to external validation. Finally, journals have an important gatekeeping role to play, by refusing publication of models lacking clear justification and rigorous methods.