TY - JOUR
T1 - Fusion of convolutional neural network with XGBoost feature extraction for predicting multi-constituents in corn using near infrared spectroscopy
AU - Zou, Xin
AU - Wang, Qiaoyun
AU - Chen, Yinji
AU - Wang, Jilong
AU - Xu, Shunyuan
AU - Zhu, Ziheng
AU - Yan, Chongyue
AU - Shan, Peng
AU - Wang, Shuyu
AU - Fu, Yong Qing
PY - 2024/8/31
Y1 - 2024/8/31
N2 - Near-infrared (NIR) spectroscopy has been widely utilized to predict multi-constituents of corn in agriculture. However, directly extracting constituent information from the NIR spectra is challenging due to many issues such as broad absorption band, overlapping and non-specific nature. To solve these problems and extract implicit features from the raw data of NIR spectra to improve performance of quantitative models, a one-dimensional shallow convolutional neural network (CNN) model based on an eXtreme Gradient Boosting (XGBoost) feature extraction method was proposed in this paper. The leaf node feature information in the XGBoost was encoded and reconstructed to obtain the implicit features of raw data in the NIR spectra. A two-parametric Swish (TSwish or TS) activation function was proposed to improve the performance of CNN, and the elastic net (EN) was also applied to avoid the overfitting problem of the CNN model. Performance of the developed XGBoost-CNN-TS-EN model was evaluated using two public NIR spectroscopy datasets of corn and soil, and the obtained determination coefficients (R2) for moisture, oil, protein, and starch of the corn on test set were 0.993, 0.991, 0.998, and 0.992, respectively, with that of the soil organic matter being 0.992. The XGBoost-CNN-TS-EN model exhibits superior stability, good prediction accuracy, and generalization ability, demonstrating its great potentials for quantitative analysis of multi-constituents in spectroscopic applications.
AB - Near-infrared (NIR) spectroscopy has been widely utilized to predict multi-constituents of corn in agriculture. However, directly extracting constituent information from the NIR spectra is challenging due to many issues such as broad absorption band, overlapping and non-specific nature. To solve these problems and extract implicit features from the raw data of NIR spectra to improve performance of quantitative models, a one-dimensional shallow convolutional neural network (CNN) model based on an eXtreme Gradient Boosting (XGBoost) feature extraction method was proposed in this paper. The leaf node feature information in the XGBoost was encoded and reconstructed to obtain the implicit features of raw data in the NIR spectra. A two-parametric Swish (TSwish or TS) activation function was proposed to improve the performance of CNN, and the elastic net (EN) was also applied to avoid the overfitting problem of the CNN model. Performance of the developed XGBoost-CNN-TS-EN model was evaluated using two public NIR spectroscopy datasets of corn and soil, and the obtained determination coefficients (R2) for moisture, oil, protein, and starch of the corn on test set were 0.993, 0.991, 0.998, and 0.992, respectively, with that of the soil organic matter being 0.992. The XGBoost-CNN-TS-EN model exhibits superior stability, good prediction accuracy, and generalization ability, demonstrating its great potentials for quantitative analysis of multi-constituents in spectroscopic applications.
KW - Activation function
KW - Convolutional neural network
KW - Elastic net
KW - Near-infrared spectroscopy
KW - XGBoost feature extraction
UR - http://www.scopus.com/inward/record.url?scp=85202992395&partnerID=8YFLogxK
U2 - 10.1016/j.foodchem.2024.141053
DO - 10.1016/j.foodchem.2024.141053
M3 - Article
AN - SCOPUS:85202992395
SN - 0308-8146
VL - 463
SP - 1
EP - 10
JO - Food Chemistry
JF - Food Chemistry
IS - Part 1
M1 - 141053
ER -