In this paper, we extend the idea of 2D objects retrieval to 3D human action retrieval and present the solution of action retrieval with spatio-temporal features. The framework of this action retrieval engine is based on the spatio-temporal interest point detector and the bag-of-words representation. For description of action features, we observe that appearance feature and structural feature from interest points can provide complementary information to each other. Then, we propose to combine brightness gradient and 3D shape context together to increase the discriminative power of descriptors. The experiments carried on the KTH dataset prove the advantage of this method. The extension of this work is applying the interest points based action retrieval technique to realistic actions in movies. As actions in movies are very complex due to the background variation, scale difference and performers’ appearance, etc., it is a difficult target to localize and describe the actions. The results show that our method is very efficient computationally and achieves a reasonable accuracy for those challenging scenarios. We believe that our work is helpful for further research on action retrieval techniques.
|Title of host publication||Multimedia Interaction and Intelligent User Interfaces|
|Editors||Ling Shao, Caifeng Shan, Jiebo Luo, Minoru Etoh|
|Place of Publication||London|
|Number of pages||302|
|Publication status||Published - 2010|
|Name||Advances in Computer Vision and Pattern Recognition|