TY - JOUR
T1 - Spatio-Temporal Laplacian Pyramid Coding for Action Recognition
AU - Shao, Ling
AU - Zhen, Xiantong
AU - Tao, Dacheng
AU - Li, Xuelong
PY - 2014/6
Y1 - 2014/6
N2 - We present a novel descriptor, called spatiotemporal Laplacian pyramid coding (STLPC), for holistic representation of human actions. In contrast to sparse representations based on detected local interest points, STLPC regards a video sequence as a whole with spatio-temporal features directly extracted from it, which prevents the loss of information in sparse representations. Through decomposing each sequence into a set of band-pass-filtered components, the proposed pyramid model localizes features residing at different scales, and therefore is able to effectively encode the motion information of actions. To make features further invariant and resistant to distortions as well as noise, a bank of 3-D Gabor filters is applied to each level of the Laplacian pyramid, followed by max pooling within filter bands and over spatio-temporal neighborhoods. Since the convolving and pooling are performed spatio-temporally, the coding model can capture structural and motion information simultaneously and provide an informative representation of actions. The proposed method achieves superb recognition rates on the KTH, the multiview IXMAS, the challenging UCF Sports, and the newly released HMDB51 datasets. It outperforms state of the art methods showing its great potential on action recognition.
AB - We present a novel descriptor, called spatiotemporal Laplacian pyramid coding (STLPC), for holistic representation of human actions. In contrast to sparse representations based on detected local interest points, STLPC regards a video sequence as a whole with spatio-temporal features directly extracted from it, which prevents the loss of information in sparse representations. Through decomposing each sequence into a set of band-pass-filtered components, the proposed pyramid model localizes features residing at different scales, and therefore is able to effectively encode the motion information of actions. To make features further invariant and resistant to distortions as well as noise, a bank of 3-D Gabor filters is applied to each level of the Laplacian pyramid, followed by max pooling within filter bands and over spatio-temporal neighborhoods. Since the convolving and pooling are performed spatio-temporally, the coding model can capture structural and motion information simultaneously and provide an informative representation of actions. The proposed method achieves superb recognition rates on the KTH, the multiview IXMAS, the challenging UCF Sports, and the newly released HMDB51 datasets. It outperforms state of the art methods showing its great potential on action recognition.
KW - Action recognition
KW - computer vision
KW - max pooling
KW - spatio-temporal Laplacian pyramid
UR - http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6572804
U2 - 10.1109/TCYB.2013.2273174
DO - 10.1109/TCYB.2013.2273174
M3 - Article
SN - 2168-2267
VL - 44
SP - 817
EP - 827
JO - IEEE Transactions on Cybernetics
JF - IEEE Transactions on Cybernetics
IS - 6
ER -