Structure-Preserving Binary Representations for RGB-D Action Recognition

Mengyang Yu, Li Liu, Ling Shao

Research output: Contribution to journalArticlepeer-review

94 Citations (Scopus)
15 Downloads (Pure)


In this paper, we propose a novel binary local representation for RGB-D video data fusion with a structure-preserving projection. Our contribution consists of two aspects. To acquire a general feature for the video data, we convert the problem to describing the gradient fields of RGB and depth information of video sequences. With the local fluxes of the gradient fields, which include the orientation and the magnitude of the neighborhood of each point, a new kind of continuous local descriptor called Local Flux Feature (LFF) is obtained. Then the LFFs from RGB and depth channels are fused into a Hamming space via the Structure Preserving Projection (SPP). Specifically, an orthogonal projection matrix is applied to preserve the pairwise structure with a shape constraint to avoid the collapse of data structure in the projected space. Furthermore, a bipartite graph structure of data is taken into consideration, which is regarded as a higher level connection between samples and classes than the pairwise structure of local features. The extensive experiments show not only the high efficiency of binary codes and the effectiveness of combining LFFs from RGB-D channels via SPP on various action recognition benchmarks of RGB-D data, but also the potential power of LFF for general action recognition.
Original languageEnglish
Pages (from-to)1651-1664
Number of pages14
JournalIEEE Transactions on Pattern Analysis and Machine Intelligence
Issue number8
Early online date15 Oct 2015
Publication statusPublished - 1 Aug 2016


Dive into the research topics of 'Structure-Preserving Binary Representations for RGB-D Action Recognition'. Together they form a unique fingerprint.

Cite this