This commentary discusses a framework for the benchmarking of hydrological models for different purposes when the datasets for different catchments might involve epistemic uncertainties. The approach might be expected to result in an ensemble of models that might be used in prediction (including models of different types) but also provides for model rejection to be the start of a learning process to improve understanding.