Efficient Spatial Reasoning for Human Pose Estimation

Ying Huang, Shanfeng Hu, Zike Zhang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

11 Downloads (Pure)


Human pose estimation from single images has made significant progress in the past but still faces fundamental challenges from the occlusion and overlapping of joints in many cases. This is partly due to the limitation of the traditional paradigm for this problem, which attempts to locate human body joints solely and as a result can fail to resolve the spatial connections among joints that are critical for the identification of the whole pose. To overcome this shortcoming, we propose to explicitly incorporate spatial reasoning into pose estimation by formulating it as a structured graph learning problem, in which each image pixel is a candidate graph node with every two nodes connected via an edge that captures their affinity. The advantage of this representation is that it allows us to learn feature embeddings for both the nodes and edges, thereby providing a sufficient capacity to delineate correct human body joints and their connecting bones. To facilitate efficient learning and inference, we exploit self-attention transformer architectures that fuse node and edge learning pathways, which can save parameter numbers and permit fast computation. Experiments on the popular MS-COCO Human pose estimation benchmark show that our method outperforms representative methods.
Original languageEnglish
Title of host publicationThe 33rd British Machine Vision Conference Proceedings
Place of PublicationDurham
PublisherThe British Machine Vision Association and Society for Pattern Recognition
Number of pages14
Publication statusPublished - 25 Nov 2022
EventBMVC2022: The 33rd British Machine Vision Conference - hybrid, The Kia Oval, London, United Kingdom
Duration: 21 Nov 202224 Nov 2022


Country/TerritoryUnited Kingdom
Internet address


Dive into the research topics of 'Efficient Spatial Reasoning for Human Pose Estimation'. Together they form a unique fingerprint.

Cite this