《计算机应用》唯一官方网站

• •    下一篇

基于3D姿态估计的动作捕捉和模型驱动方法

万鸣华1,田雨卿1,杨国为2   

  1. 1. 南京审计大学
    2. 青岛大学
  • 收稿日期:2025-01-20 修回日期:2025-04-30 发布日期:2025-05-26 出版日期:2025-05-26
  • 通讯作者: 万鸣华
  • 基金资助:
    基于紧密包裹学习的高性能开集识别分类器设计方法研究;2024江苏省研究生科研与实践创新计划项目

A new motion capture driven method based on 3D pose estimation

  • Received:2025-01-20 Revised:2025-04-30 Online:2025-05-26 Published:2025-05-26

摘要: 目前,已经有多种3D人体姿态估计模型被运用到单目视觉动作捕捉的环节中来,然而姿态估计模型的结果仅包含点位的空间坐标信息,无法计算关节的旋转情况,因此无法直接用来驱动人体模型。文中所提出的算法将姿态估计算法的结果映射至3D人体模型骨架的骨骼节点并计算骨骼旋转,以达到驱动3D人体模型的目的。首先利用姿态估计结果产生的关键点坐标进行插值来定位未被预测的关键点信息,然后利用中间矩阵对不同模型之间的骨架进行对齐,保证该方法对不同模型的适用性。最后利用卡尔曼滤波器与低通滤波器对人体的姿态进行平滑,以消除动作输入与3D场景帧率不同步产生的随机噪声和高频噪声。在基于BlazePose姿态估计模型与Human3.6M 等人体姿态估计数据集的测试下,文中所提出的驱动方法在常见的三头身到九头身的3D人体模型驱动过程中均获得了较好的表现,并且在同一段动作序列的视频中,相比直接输出坐标信息到3D场景,经卡尔曼滤波器与低通滤波器输出帧序列的均方误差由2.871降至0.831,代表模型的抖动与跳闪问题得到显著改善。

关键词: 动作捕捉, 3D姿态估计, 3D模型驱动, 骨骼动画, 姿态平滑

Abstract: At present, various 3D human pose estimation models have been applied to monocular visual motion capture. However, the results of pose estimation models only contain spatial coordinate information of points and cannot calculate joint rotation, so they cannot be directly used to drive human models. the proposed algorithm aims to map the results of pose estimation algorithms to the skeletal nodes of a 3D human body model skeleton and calculating the rotation of the bones, in order to drive the 3D human body model. Firstly, interpolation is performed using the keypoint coordinates generated from pose estimation results to locate the unpredictable keypoint information. Then, an intermediate matrix is used to align the skeletons between different models, ensuring the applicability of this method on different models. Finally, Kalman filter and low-pass filter are used to smooth the posture of the human body, in order to eliminate the random noise and high-frequency noise caused by the asynchronous action input and 3D scene frame rate. Under the testing of the BlazePose pose estimation model and other human pose estimation datasets such as Human3.6M, the proposed algorithm has achieved good performance in the common 3D human body model driving process from three head to nine head, and in the video of the same action sequence, compared with directly outputting coordinate information to the 3D scene, the mean square error(MSE) of the output frame sequence through Kalman filter and low-pass filter with has been reduced from 2.871 to 0.831, which means the shaking and flicker problems of the model have been significantly improved.

Key words: Motion capture, 3D pose estimation, 3D model driven, Skeletal animation, Smooth posture