XGBoost algorithm assisted multi-component quantitative analysis with Raman spectroscopy

Qiaoyun Wang*, Xin Zou, Yinji Chen, Ziheng Zhu, Chongyue Yan, Peng Shan, Shuyu Wang, Yongqing Fu

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

To improve prediction performance and reduce artifacts in Raman spectra, we developed an eXtreme Gradient Boosting (XGBoost) preprocessing method to preprocess the Raman spectra of glucose, glycerol and ethanol mixtures. To ensure the robustness and reliability of the XGBoost preprocessing method, an X-LR model (which combined XGBoost preprocessing and a linear regression (LR) model) and a X-MLP model (which combined XGBoost preprocessing and a multilayer perceptron (MLP) model) were developed. These two models were used to quantitatively analyze the concentrations of glucose, glycerol and ethanol in the Raman spectra of mixed solutions. The proportion map of hyperparameters was firstly used to narrow down the search range of hyperparameters in the X-LR and the X-MLP models. Then the correlation coefficients (R2), root mean square of calibration (RMSEC), and root mean square error of prediction (RMSEP) were used to evaluate the models’ performance. Experimental results indicated that the XGBoost preprocessing method achieved higher accuracy and generalization capability, with better performance than those of other preprocessing methods for both LR and MLP models.

Original languageEnglish
Article number124917
JournalSpectrochimica Acta - Part A: Molecular and Biomolecular Spectroscopy
Volume323
Early online date31 Jul 2024
DOIs
Publication statusE-pub ahead of print - 31 Jul 2024

Keywords

  • Glucose
  • Linear regression
  • Multilayer perceptron
  • Raman spectroscopy
  • XGBoost

Cite this