Spiking Vision Transformer with Saccadic Attention

Shuai Wang, Dehao Zhang, Ammar Belatreche, Yichen Xiao, Yu Liang, Yimeng Shan, Qian Sun, Enqi Zhang, Malu Zhang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

d Vision Transformers (ViTs) holds potential for achieving both energy efficiency and high performance, particularly suitable for edge vision applications. However, a significant performance gap still exists between SNN-based ViTs and their ANN counterparts. Here, we first analyze why SNN-based ViTs suffer from limited performance and identify a mismatch between the vanilla self-attention mechanism and spatio-temporal spike trains. This mismatch results in degraded spatial relevance and limited temporal interactions. To address these issues, we draw inspiration from biological saccadic attention mechanisms and introduce an innovative Saccadic Spike Self-Attention (SSSA) method. Specifically, in the spatial domain, SSSA employs a novel spike distribution-based method to effectively assess the relevance between Query and Key pairs in SNN-based ViTs. Temporally, SSSA employs a saccadic interaction module that dynamically focuses on selected visual areas at each timestep and significantly enhances whole scene understanding through temporal interactions. Building on the SSSA mechanism, we develop a SNN-based Vision Transformer (SNN-ViT). Extensive experiments across various visual tasks demonstrate that SNN-ViT achieves state-of-the-art performance with linear computational complexity. The effectiveness and efficiency of the SNN-ViT highlight its potential for power-critical edge vision applications.
Original languageEnglish
Title of host publicationThe 13th International Conference on Learning Representations
PublisherInternational Conference on Learning Representations (ICLR)
Number of pages22
Publication statusAccepted/In press - 22 Jan 2025
EventThe 13th International Conference on Learning Representations - Singapore EXPO, Singapore, Singapore
Duration: 24 Apr 202528 Apr 2025
Conference number: 13th
https://iclr.cc/Conferences/2025

Conference

ConferenceThe 13th International Conference on Learning Representations
Abbreviated titleICLR 2025
Country/TerritorySingapore
CitySingapore
Period24/04/2528/04/25
Internet address

Cite this