TY - JOUR
T1 - Causal Bio-miner: Response Biomarkers Discovery Framework for Microarray Transcriptomics Treatment Subgroups Classification
AU - El-Nabawy, Ala'a
AU - Alshabrawy, Ossama
AU - Woo, Wai Lok
PY - 2025/11/22
Y1 - 2025/11/22
N2 - In this paper, a response biomarkers discovery framework based on discriminant analysis and causal inference is introduced. The framework has two main stages, causal bio-mining and bio-markers validation. At the causal bio-mining stage, the significant biomarkers are extracted from the randomized controlled trial (RCT) dataset by different techniques, discriminant analysis, feature ranking, statistical significance and association scoring. The extracted biomarkers are then assessed with respect to the treatment group classification, using causal inference propensity score matching. The causal biomarkers when applied to the subgroups classification provided better accuracy results, however using the minimum possible features, when their causal estimate is higher than 0.15 for both the treated and the control groups. The proposed framework’s efficacy was confirmed on two publicly available datasets: LiTMUS (GEO: GSE45484) and Breast Cancer (GEO: GSE20271). The performance of the framework was compared to established techniques, including those based on statistical variance and diagonal linear discriminant analysis (DLDA). The proposed framework demonstrably outperformed these benchmark methods. Using 3 features the Lithium subgroup classification accuracy is 83.33%, while the Non-Lithium subgroup classification accuracy is 93.75%, based on causal score>=0.2. Meanwhile, using 12 features the FAC×6 subgroup classification accuracy is 81.90%, and using 13 features the T/FAC subgroup classification accuracy is 92.70%, based on causal score >=0.15.
AB - In this paper, a response biomarkers discovery framework based on discriminant analysis and causal inference is introduced. The framework has two main stages, causal bio-mining and bio-markers validation. At the causal bio-mining stage, the significant biomarkers are extracted from the randomized controlled trial (RCT) dataset by different techniques, discriminant analysis, feature ranking, statistical significance and association scoring. The extracted biomarkers are then assessed with respect to the treatment group classification, using causal inference propensity score matching. The causal biomarkers when applied to the subgroups classification provided better accuracy results, however using the minimum possible features, when their causal estimate is higher than 0.15 for both the treated and the control groups. The proposed framework’s efficacy was confirmed on two publicly available datasets: LiTMUS (GEO: GSE45484) and Breast Cancer (GEO: GSE20271). The performance of the framework was compared to established techniques, including those based on statistical variance and diagonal linear discriminant analysis (DLDA). The proposed framework demonstrably outperformed these benchmark methods. Using 3 features the Lithium subgroup classification accuracy is 83.33%, while the Non-Lithium subgroup classification accuracy is 93.75%, based on causal score>=0.2. Meanwhile, using 12 features the FAC×6 subgroup classification accuracy is 81.90%, and using 13 features the T/FAC subgroup classification accuracy is 92.70%, based on causal score >=0.15.
KW - Bio-markers
KW - Causal inference
KW - Discriminant analysis
KW - Response
KW - Transcriptomics
KW - Treatment
UR - https://www.scopus.com/pages/publications/105024750041
U2 - 10.1016/j.eswa.2025.130503
DO - 10.1016/j.eswa.2025.130503
M3 - Article
SN - 0957-4174
VL - 302
JO - Expert Systems with Applications
JF - Expert Systems with Applications
M1 - 130503
ER -