HARNet: Human Activity Recognition with Spatial-Temporal Features

Jiguang Li, Meryem Sena Şiltu, Meng Xu, Jiawei Li, Zhao Huang*, Minglei Guan*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Human Activity Recognition (HAR) is pivotal in various domains, including entertainment, security, and healthcare. Conventional methods often exhibit limitations: hierarchical spatial feature extractors capture local spatial structures but struggle with long-term dependencies, temporal dependency modeling units effectively learn sequential patterns but may lose fine-grained spatial details, and multi-perspective sequential attention modules selectively emphasize critical temporal features yet can overlook subtle local variations. To address these challenges, we propose an advanced framework that synergistically integrates these three components, effectively mitigating spatial constraints and enhancing temporal sensitivity. First, the hierarchical spatial feature extractor autonomously distills multi-level spatial representations from raw skeletal data, ensuring robust spatial encoding. Next, the temporal dependency modeling unit captures long-range temporal correlations, preserving essential motion dynamics across time. Finally, the multi-perspective sequential attention module adaptively assigns significance to different time steps, allowing the model to focus on the most informative elements while suppressing redundant information. Extensive experiments on the AIR-Act2Act dataset demonstrate the superiority of the proposed framework, achieving 99.40% accuracy on dataset 1 and 98.72% on dataset 2, significantly surpassing traditional spatial (92.02%) and temporal models (96.40%) as well as other state-of-the-art approaches (98.0%).

Original languageEnglish
Title of host publicationArtificial Neural Networks and Machine Learning – ICANN 2025 - 34th International Conference on Artificial Neural Networks, 2025, Proceedings
EditorsWalter Senn, Marcello Sanguineti, Ausra Saudargiene, Igor V. Tetko, Alessandro E. P. Villa, Viktor Jirsa, Yoshua Bengio
Place of PublicationCham, Switzerland
PublisherSpringer
Pages1-13
Number of pages13
ISBN (Electronic)9783032045461
ISBN (Print)9783032045454
DOIs
Publication statusPublished - 11 Sept 2025
Event34th International Conference on Artificial Neural Networks, ICANN 2025 - Kaunas, Lithuania
Duration: 9 Sept 202512 Sept 2025

Publication series

NameLecture Notes in Computer Science
Volume16069 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference34th International Conference on Artificial Neural Networks, ICANN 2025
Country/TerritoryLithuania
CityKaunas
Period9/09/2512/09/25

Keywords

  • Convolutional Neural Networks
  • Human Activity Recognition
  • Long Short-Term Memory
  • Multi-Head Attention Mechanisms

Cite this