Skip to main navigation Skip to search Skip to main content

Multi-aspect fusion in foundational large vision model for visible light medical imaging segmentation

Xingru Huang, Tianyun Zhang, Zhaoyang Xu, Jian Huang, Gaopeng Huang, Han Yang, Binfeng Zou, Shouqin Ding, Renjie Ruan*, Zhao Huang, Huiyu Zhou, Jin Liu, Zhiwen Zheng, Shaowei Jiang, Xiaoshuai Zhang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

Accurate lesion quantification represents a critical component of precision diagnostics and targeted therapeutic strategies, yet current methodologies face challenges when confronted with the diverse contextual and complicated structures inherent in visible-light medical imaging, including semantic ambiguity, noise interference, and geometric complexity, which collectively hinder segmentation accuracy. Targeting these challenges, we proposes the Multi-Aspect Large Vision Model (MasLVM), a foundational model for optical medical imaging that achieves comprehensive feature fusion across Tri-Path fusion. The Semantic Context Encoder (SCE) integrates a pre-trained large vision model with global semantic embeddings to improve contextual abstraction and mitigate semantic ambiguities. The Spectral Spline Encoder (SSE), incorporating the Multi-Frequency Feature Modulator (MFFM) and Kolmogorov–Arnold Networks (KAN) Channel Attention, transitions image representations into the frequency domain to selectively attenuate noise while preserving essential structural features. The Hierarchical Deformable Morphometry Encoder (HDME) employs deformable convolutions and multi-scale encoding to capture heterogeneous geometric structures dynamically. The outputs from these branches are synthesized through the Multi-Attention KAN Decoder, which employs KAN multiple self-attention and iterative attentional fusion to select and enhance semantic, spectral, and morphological critical domain features adaptively. Extensive experiments across six widely recognized datasets demonstrate that MasLVM achieves convincing performance compared with multiple previous state-of-the-art (SoTA) methods, and potential utility in adapting to diverse requirements of visible-light medical imaging tasks under constrained conditions. The code and model weights can be directly used for medical task deployment or fine-tuning, and are publicly available at the following link: https://github.com/IMOP-lab/MasLVM-Pytorch.

Original languageEnglish
Article number103385
JournalInformation Fusion
Volume125
Early online date11 Jul 2025
DOIs
Publication statusPublished - 1 Jan 2026

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 3 - Good Health and Well-being
    SDG 3 Good Health and Well-being

Keywords

  • Colonoscopy
  • Foundational models
  • Skin cancer detection
  • Visible light medical image

Fingerprint

Dive into the research topics of 'Multi-aspect fusion in foundational large vision model for visible light medical imaging segmentation'. Together they form a unique fingerprint.

Cite this