TY - JOUR
T1 - An Explainable Supervised Machine Learning Model for Predicting Respiratory Toxicity of Chemicals Using Optimal Molecular Descriptors
AU - Jaganathan, Keerthana
AU - Tayara, Hilal
AU - Chong, Kil To
N1 - Funding information: This work was supported in part by the National Research Foundation of Korea (NRF) grant funded by the Korea government(MSIT) (No. 2020R1A2C2005612) and in part by the Brain Research Program of the National Research Foundation (NRF) funded by the Korean government (MSIT) (No. NRF-2017M3C7A1044816).
PY - 2022/4/11
Y1 - 2022/4/11
N2 - Respiratory toxicity is a serious public health concern caused by the adverse effects of drugs or chemicals, so the pharmaceutical and chemical industries demand reliable and precise computational tools to assess the respiratory toxicity of compounds. The purpose of this study is to develop quantitative structure-activity relationship models for a large dataset of chemical compounds associated with respiratory system toxicity. First, several feature selection techniques are explored to find the optimal subset of molecular descriptors for efficient modeling. Then, eight different machine learning algorithms are utilized to construct respiratory toxicity prediction models. The support vector machine classifier outperforms all other optimized models in 10-fold cross-validation. Additionally, it outperforms the prior study by 2% in prediction accuracy and 4% in MCC. The best SVM model achieves a prediction accuracy of 86.2% and a MCC of 0.722 on the test set. The proposed SVM model predictions are explained using the SHapley Additive exPlanations approach, which prioritizes the relevance of key modeling descriptors influencing the prediction of respiratory toxicity. Thus, our proposed model would be incredibly beneficial in the early stages of drug development for predicting and understanding potential respiratory toxic compounds.
AB - Respiratory toxicity is a serious public health concern caused by the adverse effects of drugs or chemicals, so the pharmaceutical and chemical industries demand reliable and precise computational tools to assess the respiratory toxicity of compounds. The purpose of this study is to develop quantitative structure-activity relationship models for a large dataset of chemical compounds associated with respiratory system toxicity. First, several feature selection techniques are explored to find the optimal subset of molecular descriptors for efficient modeling. Then, eight different machine learning algorithms are utilized to construct respiratory toxicity prediction models. The support vector machine classifier outperforms all other optimized models in 10-fold cross-validation. Additionally, it outperforms the prior study by 2% in prediction accuracy and 4% in MCC. The best SVM model achieves a prediction accuracy of 86.2% and a MCC of 0.722 on the test set. The proposed SVM model predictions are explained using the SHapley Additive exPlanations approach, which prioritizes the relevance of key modeling descriptors influencing the prediction of respiratory toxicity. Thus, our proposed model would be incredibly beneficial in the early stages of drug development for predicting and understanding potential respiratory toxic compounds.
KW - respiratory toxicity
KW - molecular descriptors
KW - feature selection
KW - machine learning
KW - SHapley Additive exPlanations
UR - https://www.scopus.com/pages/publications/85129120185
U2 - 10.3390/pharmaceutics14040832
DO - 10.3390/pharmaceutics14040832
M3 - Article
SN - 1999-4923
VL - 14
JO - Pharmaceutics
JF - Pharmaceutics
IS - 4
M1 - 832
ER -