We present a system to classify the gesture from only one learning example. The inputs are duo-modality, i.e. RGB and depth sensor from Kinect. Our system performs morphological denoising on depth images and automatically segments the temporal boundaries. Features are extracted based on Extended-Motion-History-Image (Extended-MHI) and the Multi-view Spectral Embedding (MSE) algorithm is used to fuse duo modalities in a physically meaningful manner. Our approach achieves less than 0.3 in Levenshtein distance in CHALEARN Gesture Challenge validation batches.
|Published - Jun 2012
|CVPRW 2012 - IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops - Providence, USA
Duration: 1 Jun 2012 → …
|CVPRW 2012 - IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops
|1/06/12 → …