Search Result

Journals

Publication Years

Keywords

Please wait a minute...

For Selected:

Download Citations
EndNote Ris BibTeX

Toggle Thumbnails

Select

Spatio-temporal context network for 3D human pose estimation based on graph attention

Zhengdong ZENG, Ming ZHAO

Journal of Computer Applications 2025, 45 (10): 3161-3169. DOI: 10.11772/j.issn.1001-9081.2024101489

Abstract （48）

HTML （0）

PDF （3822KB）（117）

Save

According to recent research on human pose estimation， making full use of potential 2D pose space information to acquire representative characteristics can produce more accurate 3D pose results. Therefore， a spatio-temporal context network based on graph attention mechanism was proposed， which includes Temporal Context Network with Shifted windows （STCN）， Extremity-Guided global graph ATtention mechanism network （EGAT）， and Pose Grammar-based local graph attention Convolution Network （PGCN）. Firstly， STCN was used to transform the 2D joint position in long sequence into potential features of human pose in single sequence， which aggregated and utilized the long-range and short-range human pose information effectively， and reduce the computational cost significantly. Secondly， EGAT was presented for computing global spatial context effectively， so that human extremities were treated as “traffic hubs”， and bridges were established for information exchange between them and other nodes. Thirdly， graph attention mechanism was utilized for adaptive weight assignment to perform global context computation on human joints. Finally， PGCN was designed to utilize Graph Convolution Network （GCN） for computing and modeling local spatial context， thereby emphasizing the motion consistency of symmetrical nodes of human and the motion correlation structure of human bones. Evaluations of the proposed model were conducted on the two complex benchmark datasets： Human3.6M and HumanEva-Ⅰ. Experimental results demonstrate that the proposed model has superior performance. Specifically， when the input frame length is 81， the proposed model achieves a Mean Per Joint Position Error （MPJPE） of 43.5 mm on the Human3.6M dataset， which represents a 10.5% reduction compared to that of the state-of-the-art algorithm MCFNet （Multi-scale Cross Fusion Network）， showcasing higher accuracy.

Table and Figures | Reference | Related Articles | Metrics