TY - GEN
T1 - A Two-Stream Recurrent Network for Skeleton-based Human Interaction Recognition
AU - Men, Qianhui
AU - Ho, Edmond S. L.
AU - Shum, Hubert
AU - Leung, Howard
N1 - Funding information: The project is supported in part by grants from City University of Hong Kong (Project No. 9220077 and 9678139), and the Royal Society (Ref: IES\R2\181024 and IES\R1\191147).
PY - 2021/1/10
Y1 - 2021/1/10
N2 - This paper addresses the problem of recognizing human-human interaction from skeletal sequences. Existing methods are mainly designed to classify single human action. Many of them simply stack the movement features of two characters to deal with human interaction, while neglecting the abundant relationships between characters. In this paper, we propose a novel two-stream recurrent neural network by adopting the geometric features from both single actions and interactions to describe the spatial correlations with different discriminative abilities. The first stream is constructed under pairwise joint distance (PJD) in a fully-connected mesh to categorize the interactions with explicit distance patterns. To better distinguish similar interactions, in the second stream, we combine PJD with the spatial features from individual joint positions using graph convolutions to detect the implicit correlations among joints, where the joint connections in the graph are adaptive for flexible correlations. After spatial modeling, each stream is fed to a bi-directional LSTM to encode two-way temporal properties. To take advantage of the diverse discriminative power of the two streams, we come up with a late fusion algorithm to combine their output predictions concerning information entropy. Experimental results show that the proposed framework achieves state-of-the art performance on 3D and comparable performance on 2D interaction datasets. Moreover, the late fusion results demonstrate the effectiveness of improving the recognition accuracy compared with single streams.
AB - This paper addresses the problem of recognizing human-human interaction from skeletal sequences. Existing methods are mainly designed to classify single human action. Many of them simply stack the movement features of two characters to deal with human interaction, while neglecting the abundant relationships between characters. In this paper, we propose a novel two-stream recurrent neural network by adopting the geometric features from both single actions and interactions to describe the spatial correlations with different discriminative abilities. The first stream is constructed under pairwise joint distance (PJD) in a fully-connected mesh to categorize the interactions with explicit distance patterns. To better distinguish similar interactions, in the second stream, we combine PJD with the spatial features from individual joint positions using graph convolutions to detect the implicit correlations among joints, where the joint connections in the graph are adaptive for flexible correlations. After spatial modeling, each stream is fed to a bi-directional LSTM to encode two-way temporal properties. To take advantage of the diverse discriminative power of the two streams, we come up with a late fusion algorithm to combine their output predictions concerning information entropy. Experimental results show that the proposed framework achieves state-of-the art performance on 3D and comparable performance on 2D interaction datasets. Moreover, the late fusion results demonstrate the effectiveness of improving the recognition accuracy compared with single streams.
U2 - 10.1109/icpr48806.2021.9412538
DO - 10.1109/icpr48806.2021.9412538
M3 - Conference contribution
SN - 9781728188096
T3 - 2020 25th International Conference on Pattern Recognition (ICPR)
SP - 2771
EP - 2778
BT - Proceedings of ICPR 2020: 25th International Conference on Pattern Recognition
PB - IEEE
CY - Piscataway, NJ
T2 - International Conference on Pattern Recognition (ICPR2020)
Y2 - 10 January 2021 through 15 January 2021
ER -