Reihaneh Zarrabi

and 3 more

Widely adopted models for estimating channel geometry attributes rely on simplistic power-law (hydraulic geometry) equations. This study presents a new generation of channel geometry models based on a hybrid approach combining traditional statistical methods (Multi-Linear Regression (MLR)) and advanced tree-based Machine Learning (ML) algorithms (Random Forest Regression (RFR) and eXtreme Gradient Boosting Regression (XGBR)), utilizing novel datasets. To achieve this, a new preprocessing method was applied to refine an extensive observational dataset, namely the HYDRoacoustic dataset supporting Surface Water Oceanographic Topography (HYDRoSWOT). This process improved data quality and identified observations representing bankfull and mean-flow conditions. A compiled dataset, combining the preprocessed dataset with datasets containing additional catchment attributes like the National Hydrography Dataset Plus (NHDplusv2.1), was then used to train a suite of models to predict channel width and depth under bankfull and mean-flow conditions. The analysis shows that tree-based ML algorithms outperform traditional statistical methods in accuracy and handling the data but face limitations in prediction capabilities for streams with characteristics outside the training range. Consequently, a hybrid method was selected, combining XGBR for streams within the dataset range and MLR for those outside it. Two tiers of models were developed for each attribute using discharges derived from distinct sources (HYDRoSWOT and NHDPlusV2.1, respectively), where the second tier of models offers applicability across approximately 2.6 million streams within NHDplusv2.1. Comprehensive independent evaluations are conducted to assess the capability of the developed models in providing stream/reach-averaged (rather than at-a-station) predictions for locations outside the training and testing datasets.