3D Action Recognition Using Multi-Temporal Depth Motion Maps and Fisher Vector

Chen Chen, Mengyuan Liu, Baochang Zhang, Jungong Han, Junjun Jiang, Hong Liu

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

67 Citations (Scopus)

Abstract

This paper presents an effective local spatio-temporal descriptor for action recognition from depth video sequences. The unique property of our descriptor is that it takes the shape discrimination and action speed variations into account, intending to solve the problems of distinguishing different pose shapes and identifying the actions with different speeds in one goal. The entire algorithm is carried out in three stages. In the first stage, a depth sequence is divided into temporally overlapping depth segments which are used to generate three depth motion maps (DMMs), capturing the shape and motion cues. To cope with speed variations in actions, multiple frame lengths of depth segments are utilized, leading to a multi-temporal DMMs representation. In the second stage, all the DMMs are first partitioned into dense patches. Then, the local binary patterns (LBP) descriptor is exploited to characterize local rotation invariant texture information in those patches. In the third stage, the Fisher kernel is employed to encode the patch descriptors for a compact feature representation, which is fed into a kernel-based extreme learning machine classifier. Extensive experiments on the public MSRAction3D, MSRGesture3D and DHA datasets show that our proposed method outperforms state-of-the-art approaches for depth-based action recognition.
Original languageEnglish
Title of host publicationProceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence
PublisherInternational Joint Conferences on Artificial Intelligence
Pages3331-3337
ISBN (Print)978-1-57735-771-1
Publication statusPublished - 2016

Fingerprint

Dive into the research topics of '3D Action Recognition Using Multi-Temporal Depth Motion Maps and Fisher Vector'. Together they form a unique fingerprint.

Cite this