Abstract
Head and neck squamous cell carcinoma (HNSCC) is a prevalent and aggressive cancer, and accurate staging using the AJCC system is essential for treatment planning. This study aims to enhance AJCC staging by integrating both clinical and imaging data using a multimodal deep learning pipeline. We propose a framework that employs a VGG16-based masked autoencoder (MAE) for self-supervised visual feature learning, enhanced by attention mechanisms (CBAM and BAM), and fuses image and clinical features using an attention-weighted fusion network. The models, benchmarked on the HNSCC and HN1 datasets, achieved approximately 80% accuracy (four classes) and ~66% accuracy (five classes), with notable AUC improvements, especially under BAM. The integration of clinical features significantly enhances stage-classification performance, setting a precedent for robust multimodal pipelines in radiomics-based oncology applications.
| Original language | English |
|---|---|
| Article number | 2115 |
| Number of pages | 14 |
| Journal | Cancers |
| Volume | 17 |
| Issue number | 13 |
| DOIs | |
| Publication status | Published - 24 Jun 2025 |
Keywords
- head and neck cancer
- AJCC staging
- vision transformer
- masked autoencoder
- multimodal fusion
- radiomics