Saliency-Informed Spatio-Temporal Vector of Locally Aggregated Descriptors and Fisher Vector for Visual Action Recognition

Zheming Zuo, Daniel Organisciak, Hubert P. H. Shum, Longzhi Yang

Research output: Contribution to conferencePaperpeer-review

31 Downloads (Pure)

Abstract

Feature encoding has been extensively studied for the task of visual action recognition (VAR). The recently proposed super vector-based encoding methods, such as the Vector of Locally Aggregated Descriptors (VLAD) and the Fisher Vector (FV), have significantly improved the recognition performance. Despite of the success, they still struggle with the superfluous information that presents during the training stage, which makes the methods computationally expensive when applied to a large number of extracted features. In order to address such challenge, this paper proposes a Saliency-Informed Spatio-Temporal VLAD (SST-VLAD) approach which selects the extracted features corresponding to small amount of videos in the data set by considering both the spatial and temporal video-wise saliency scores; and the same extension principle has also been applied to the FV approach. The experimental results indicate that the proposed feature encoding scheme consistently outperforms the existing ones with significantly lower computational cost.
Original languageEnglish
Publication statusPublished - 3 Sept 2018
EventBMVC 2018 - British Machine Vision Conference -
Duration: 3 Sept 20186 Sept 2018
http://www.bmvc.org

Conference

ConferenceBMVC 2018 - British Machine Vision Conference
Period3/09/186/09/18
Internet address

Fingerprint

Dive into the research topics of 'Saliency-Informed Spatio-Temporal Vector of Locally Aggregated Descriptors and Fisher Vector for Visual Action Recognition'. Together they form a unique fingerprint.

Cite this