Abstract
Accurate lesion quantification represents a critical component of precision diagnostics and targeted therapeutic strategies, yet current methodologies face challenges when confronted with the diverse contextual and complicated structures inherent in visible-light medical imaging, including semantic ambiguity, noise interference, and geometric complexity, which collectively hinder segmentation accuracy. Targeting these challenges, we proposes the Multi-Aspect Large Vision Model (MasLVM), a foundational model for optical medical imaging that achieves comprehensive feature fusion across Tri-Path fusion. The Semantic Context Encoder (SCE) integrates a pre-trained large vision model with global semantic embeddings to improve contextual abstraction and mitigate semantic ambiguities. The Spectral Spline Encoder (SSE), incorporating the Multi-Frequency Feature Modulator (MFFM) and Kolmogorov–Arnold Networks (KAN) Channel Attention, transitions image representations into the frequency domain to selectively attenuate noise while preserving essential structural features. The Hierarchical Deformable Morphometry Encoder (HDME) employs deformable convolutions and multi-scale encoding to capture heterogeneous geometric structures dynamically. The outputs from these branches are synthesized through the Multi-Attention KAN Decoder, which employs KAN multiple self-attention and iterative attentional fusion to select and enhance semantic, spectral, and morphological critical domain features adaptively. Extensive experiments across six widely recognized datasets demonstrate that MasLVM achieves convincing performance compared with multiple previous state-of-the-art (SoTA) methods, and potential utility in adapting to diverse requirements of visible-light medical imaging tasks under constrained conditions. The code and model weights can be directly used for medical task deployment or fine-tuning, and are publicly available at the following link: https://github.com/IMOP-lab/MasLVM-Pytorch.
| Original language | English |
|---|---|
| Article number | 103385 |
| Journal | Information Fusion |
| Volume | 125 |
| Early online date | 11 Jul 2025 |
| DOIs | |
| Publication status | Published - 1 Jan 2026 |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 3 Good Health and Well-being
Keywords
- Colonoscopy
- Foundational models
- Skin cancer detection
- Visible light medical image
Fingerprint
Dive into the research topics of 'Multi-aspect fusion in foundational large vision model for visible light medical imaging segmentation'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver