We present a system to classify the gesture from only one learning example. The inputs are duo-modality, i.e. RGB and depth sensor from Kinect. Our system performs morphological denoising on depth images and automatically segments the temporal boundaries. Features are extracted based on Extended-Motion-History-Image (Extended-MHI) and the Multi-view Spectral Embedding (MSE) algorithm is used to fuse duo modalities in a physically meaningful manner. Our approach achieves less than 0.3 in Levenshtein distance in CHALEARN Gesture Challenge validation batches.
|Publication status||Published - Jun 2012|
|Event||CVPRW 2012 - IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops - Providence, USA|
Duration: 1 Jun 2012 → …
|Conference||CVPRW 2012 - IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops|
|Period||1/06/12 → …|