A novel recursive transformer-based U-Net architecture for enhanced multi-scale medical image segmentation

Shanshan Li, Xuefeng Liu*, Min Fu, Fouad Khelifi

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Background:
Automatic medical image segmentation techniques are vital for assisting clinicians in making accurate diagnoses and treatment plans. Although the U-shaped network (U-Net) has been widely adopted in medical image analysis, it still faces challenges in capturing long-range dependencies, particularly in complex and textured medical images where anatomical structures often blend into the surrounding background.

Method:
To address these limitations, a novel network architecture, called recursive transformer-based U-Net (ReT-UNet), which integrates recursive feature learning and transformer technology, is proposed. One of the key innovations of ReT-UNet is the multi-scale global feature fusion (Multi-GF) module, inspired by transformer models and multi-scale pooling mechanisms. This module captures long-range dependencies, enhancing the abstraction and contextual understanding of multi-level features. Additionally, a recursive feature accumulation block is introduced to iteratively update features across layers, improving the network’s ability to model spatial correlations and represent deep features in medical images. To improve sensitivity to local details, a lightweight atrous spatial pyramid pooling (ASPP) module is appended after the Multi-GF module. Furthermore, the segmentation head is redesigned to emphasize feature aggregation and fusion. During the encoding phase, a hybrid pooling layer is employed to ensure comprehensive feature sampling, thereby enabling a broader range of feature representation and improving detailed information learning.

Results:
Results: The proposed method has been evaluated through ablation experiments, demonstrating generally consistent performance across multiple trials. When applied to cardiac, pulmonary nodule, and polyp segmentation datasets, the method showed a reduction in mis-segmented regions. The experimental results suggest that the approach can improve segmentation accuracy and stability compared to competing state-of-the-art methods.

Conclusions:
Experimental findings highlight the superiority of the proposed ReT-UNet over related methods and demonstrate its potential for applications in medical image segmentation.
Original languageEnglish
Article number110658
Pages (from-to)1-18
Number of pages18
JournalComputers in Biology and Medicine
Volume196
Issue numberPart A
Early online date6 Jul 2025
DOIs
Publication statusPublished - 1 Sept 2025

Keywords

  • Medical image processing
  • Recursive feature learning
  • Transformer technology
  • Multi-scale feature fusion
  • Deep learning

Cite this