River water quality index prediction and uncertainty analysis: A comparative study of machine learning models

Seyed Babak Haji Seyed Asadollah*, Ahmad Sharafati*, Davide Motta*, Zaher Mundher Yaseen*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

143 Citations (Scopus)


The Water Quality Index (WQI) is the most common indicator to characterize surface water quality. This study introduces a new ensemble machine learning model called Extra Tree Regression (ETR) for predicting monthly WQI values at the Lam Tsuen River in Hong Kong. The ETR model performance is compared with that of the classic standalone models, Support Vector Regression (SVR) and Decision Tree Regression (DTR). The monthly input water quality data including Biochemical Oxygen Demand (BOD), Chemical Oxygen Demand (COD), Dissolved Oxygen (DO), Electrical Conductivity (EC), Nitrate-Nitrogen ( NO 3 -N), Nitrite-Nitrogen ( NO 2 -N), Phosphate ( P O 4 3 - ), potential for Hydrogen (pH), Temperature (T) and Turbidity (TUR) are used for building the prediction models. Various input data combinations are investigated and assessed in terms of prediction performance, using numerical indices and graphical comparisons. The analysis shows that the ETR model generally produces more accurate WQI predictions for both training and testing phases. Although including all the ten input variables achieves the highest prediction performance ( R 2 t e s t = 0.98 , R M S E t e s t = 2.99 ), a combination of input parameters including only BOD, Turbidity and Phosphate concentration provides the second highest prediction accuracy ( R 2 t e s t = 0.97 , R M S E t e s t = 3.74 ). The uncertainty analysis relative to model structure and input parameters highlights a higher sensitivity of the prediction results to the former. In general, the ETR model represents an improvement on previous approaches for WQI prediction, in terms of prediction performance and reduction in the number of input parameters.
Original languageEnglish
Article number104599
Number of pages14
JournalJournal of Environmental Chemical Engineering
Issue number1
Early online date18 Oct 2020
Publication statusPublished - 1 Feb 2021


Dive into the research topics of 'River water quality index prediction and uncertainty analysis: A comparative study of machine learning models'. Together they form a unique fingerprint.

Cite this