TY - JOUR
T1 - High-speed Multi-person Pose Estimation with Deep Feature Transfer
AU - Huang, Ying
AU - Shum, Hubert P. H.
AU - Ho, Edmond S. L.
AU - Aslam, Nauman
PY - 2020/8/1
Y1 - 2020/8/1
N2 - Recent advancements in deep learning have significantly improved the accuracy of multi-person pose estimation from RGB images. However, these deep learning methods typically rely on a large number of deep refinement modules to refine the features of body joints and limbs, which hugely reduce the run-time speed and therefore limit the application domain. In this paper, we propose a feature transfer framework to capture the concurrent correlations between body joint and limb features. The concurrent correlations of these features form a complementary structural relationship, which mutually strengthens the network's inferences and reduces the needs of refinement modules. The transfer sub-network is implemented with multiple convolutional layers, and is merged with the body part detection network to form an end-to-end system. The transfer relationship is automatically learned from ground-truth data instead of being manually encoded, resulting in a more general and efficient design. The proposed framework is validated on the multiple popular multi-person pose estimation benchmarks - MPII, COCO 2018 and PoseTrack 2017 and 2018. Experimental results show that our method not only significantly increases the inference speed to 73.8 frame per second (FPS), but also attains comparable state-of-the-art performance.
AB - Recent advancements in deep learning have significantly improved the accuracy of multi-person pose estimation from RGB images. However, these deep learning methods typically rely on a large number of deep refinement modules to refine the features of body joints and limbs, which hugely reduce the run-time speed and therefore limit the application domain. In this paper, we propose a feature transfer framework to capture the concurrent correlations between body joint and limb features. The concurrent correlations of these features form a complementary structural relationship, which mutually strengthens the network's inferences and reduces the needs of refinement modules. The transfer sub-network is implemented with multiple convolutional layers, and is merged with the body part detection network to form an end-to-end system. The transfer relationship is automatically learned from ground-truth data instead of being manually encoded, resulting in a more general and efficient design. The proposed framework is validated on the multiple popular multi-person pose estimation benchmarks - MPII, COCO 2018 and PoseTrack 2017 and 2018. Experimental results show that our method not only significantly increases the inference speed to 73.8 frame per second (FPS), but also attains comparable state-of-the-art performance.
KW - Deep learning
KW - Feature transfer
KW - Human pose estimation
UR - http://www.scopus.com/inward/record.url?scp=85086565232&partnerID=8YFLogxK
U2 - 10.1016/j.cviu.2020.103010
DO - 10.1016/j.cviu.2020.103010
M3 - Article
AN - SCOPUS:85086565232
SN - 1077-3142
VL - 197-198
JO - Computer Vision and Image Understanding
JF - Computer Vision and Image Understanding
M1 - 103010
ER -