Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Spatio-temporal context network for 3D human pose estimation based on graph attention
Zhengdong ZENG, Ming ZHAO
Journal of Computer Applications    2025, 45 (10): 3161-3169.   DOI: 10.11772/j.issn.1001-9081.2024101489
Abstract48)   HTML0)    PDF (3822KB)(117)       Save

According to recent research on human pose estimation, making full use of potential 2D pose space information to acquire representative characteristics can produce more accurate 3D pose results. Therefore, a spatio-temporal context network based on graph attention mechanism was proposed, which includes Temporal Context Network with Shifted windows (STCN), Extremity-Guided global graph ATtention mechanism network (EGAT), and Pose Grammar-based local graph attention Convolution Network (PGCN). Firstly, STCN was used to transform the 2D joint position in long sequence into potential features of human pose in single sequence, which aggregated and utilized the long-range and short-range human pose information effectively, and reduce the computational cost significantly. Secondly, EGAT was presented for computing global spatial context effectively, so that human extremities were treated as “traffic hubs”, and bridges were established for information exchange between them and other nodes. Thirdly, graph attention mechanism was utilized for adaptive weight assignment to perform global context computation on human joints. Finally, PGCN was designed to utilize Graph Convolution Network (GCN) for computing and modeling local spatial context, thereby emphasizing the motion consistency of symmetrical nodes of human and the motion correlation structure of human bones. Evaluations of the proposed model were conducted on the two complex benchmark datasets: Human3.6M and HumanEva-Ⅰ. Experimental results demonstrate that the proposed model has superior performance. Specifically, when the input frame length is 81, the proposed model achieves a Mean Per Joint Position Error (MPJPE) of 43.5 mm on the Human3.6M dataset, which represents a 10.5% reduction compared to that of the state-of-the-art algorithm MCFNet (Multi-scale Cross Fusion Network), showcasing higher accuracy.

Table and Figures | Reference | Related Articles | Metrics