Retrieving Human Actions Using Spatio-Temporal Features and Relevance Feedback

Rui Jin, Ling Shao

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review


In this paper, we extend the idea of 2D objects retrieval to 3D human action retrieval and present the solution of action retrieval with spatio-temporal features. The framework of this action retrieval engine is based on the spatio-temporal interest point detector and the bag-of-words representation. For description of action features, we observe that appearance feature and structural feature from interest points can provide complementary information to each other. Then, we propose to combine brightness gradient and 3D shape context together to increase the discriminative power of descriptors. The experiments carried on the KTH dataset prove the advantage of this method. The extension of this work is applying the interest points based action retrieval technique to realistic actions in movies. As actions in movies are very complex due to the background variation, scale difference and performers’ appearance, etc., it is a difficult target to localize and describe the actions. The results show that our method is very efficient computationally and achieves a reasonable accuracy for those challenging scenarios. We believe that our work is helpful for further research on action retrieval techniques.
Original languageEnglish
Title of host publicationMultimedia Interaction and Intelligent User Interfaces
EditorsLing Shao, Caifeng Shan, Jiebo Luo, Minoru Etoh
Place of PublicationLondon
Number of pages302
ISBN (Print)9781849965064
Publication statusPublished - 2010

Publication series

NameAdvances in Computer Vision and Pattern Recognition
ISSN (Electronic)2191-6586


Dive into the research topics of 'Retrieving Human Actions Using Spatio-Temporal Features and Relevance Feedback'. Together they form a unique fingerprint.

Cite this