Three-dimensional human reconstruction model based on high-resolution net and graph convolutional network

doi:10.11772/j.issn.1001-9081.2021122075

Abstract

Abstract:

Focused on the head pose flipping and the implicit spatial cues missing between image features when reconstructing human body from monocular images， a three-dimensional human reconstruction model based on High-Resolution Net （HRNet） and Graph Convolutional Network （GCN） was proposed. Firstly， the rich human feature information was extracted from the original image by using HRNet and residual blocks as the backbone network. Then， the accurate spatial feature representation was obtained by using GCN to capture the implicit spatial cues. Finally， the parameters of Skinned Multi-Person Linear model （SMPL） were predicted by using the features， thereby obtaining more accurate reconstruction results. At the same time， to effectively solve the problem of human head pose flipping， the joint points of SMPL were redefined and the definition of the head joint points were added on the basis of the original joints. Experimental results show that this model can exactly reconstruct the three-dimensional human body. The reconstruction accuracy of this model on the 2D dataset LSP reaches 92.41%， and the joint error and reconstruction error of the model are greatly reduced on the 3D dataset MPI-INF-3DHP with the average of only 97.73 mm and 64.63 mm respectively， verifying the effectiveness of the proposed model in the field of human reconstruction.

Key words: Graph Convolutional Network (GCN), High-Resolution Net (HRNet), human reconstruction, Skinned Multi-Person Linear model (SMPL), residual block

摘要：

针对单目图像重建人体时出现的头部姿态翻转和图像特征间隐式空间线索缺失的问题，提出了一种基于高分辨率网络（HRNet）和图卷积网络（GCN）的三维人体重建模型。首先利用HRNet和残差块作为主干网络从原始图像中提取丰富的人体特征信息，然后使用GCN来捕获特征之间隐式的空间线索以获得空间精确的特征表示，最后使用此特征来预测多人线性蒙皮模型（SMPL）的参数以得到更加准确的重建结果；同时为了有效解决人体头部姿态翻转的问题，对SMPL的关节点重新进行了定义，在原有关节的基础上增加对头部关节点的定义。实验结果表明，所提模型能够准确地重建出三维人体，在2D数据集LSP上的重建准确率达到了92.41%，在3D数据集MPI-INF-3DHP上的关节误差和重建误差也大幅降低，平均误差仅分别为97.73 mm和64.63 mm，验证了所提模型在人体重建领域的有效性。

关键词: 图卷积网络, 高分辨率网络, 人体重建, 多人线性蒙皮模型, 残差块

CLC Number:

TP391.41

Yating SU, Cuixiang LIU. Three-dimensional human reconstruction model based on high-resolution net and graph convolutional network[J]. Journal of Computer Applications, 2023, 43(2): 583-588.

苏亚婷, 刘翠响. 基于高分辨率网络和图卷积网络的三维人体重建模型[J]. 《计算机应用》唯一官方网站, 2023, 43(2): 583-588.

Figures/Tables 11

References 29

1	杨继魁. 基于Kinect单次拍摄数据准确估计人体全身体型与姿态的研究［D］. 合肥：安徽大学， 2019：10-16.
	YANG J K. Accurately estimating the whole body shape and pose of human body based on Kinect single shot data［D］. Hefei： Anhui University， 2019： 10-16.
2	LOPER M， MAHMOOD N， ROMERO J， et al. SMPL： a skinned multi-person linear model［J］. ACM Transactions on Graphics， 2015， 34（6）： No.248. 10.1145/2816795.2818013
3	SUN K， XIAO B， LIU D， et al. Deep high-resolution representation learning for human pose estimation［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 5686-5696. 10.1109/cvpr.2019.00584
4	HE K M， ZHANG X Y， REN S Q， et al. Deep residual learning for image recognition［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 770-778. 10.1109/cvpr.2016.90
5	张亚凤，刘翠响，马杰，等. 基于多特征点匹配的三维人体姿态重建［J］. 激光与光电子学进展， 2022， 59（16）：325-332. 10.3788/lop202259.1615003
	ZHANG Y F， LIU C X， MA J， et al. Three-dimensional human pose reconstruction based on multifeature point matching［J］. Laser and Optoelectronics Progress， 2022， 59（16）：325-332. 10.3788/lop202259.1615003
6	ANGUELOV D， SRINIVASAN P， KOLLER D， et al. SCAPE： shape completion and animation of people［J］. ACM Transactions on Graphics， 2005， 24（3）： 408-416. 10.1145/1073204.1073207
7	GUAN P， WEISS A， BĂLAN A O， et al. Estimating human shape and pose from a single image［C］// Proceedings of the IEEE 12th International Conference on Computer Vision. Piscataway： IEEE， 2009： 1381-1388. 10.1109/iccv.2009.5459300
8	BĂLAN A O， SIGAL L， BLACK M J， et al. Detailed human shape and pose from images［C］// Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2007： 1-8. 10.1109/cvpr.2007.383340
9	BOBO F， KANAZAWA A， LASSNER C， et al. Keep it SMPL： automatic estimation of 3D human pose and shape from a single image［C］// Proceedings of the 2016 European Conference on Computer Vision， LNCS 9909. Cham： Springer， 2016： 561-578.
10	HUANG Y H， BOGO F， LASSNER C， et al. Towards accurate marker-less human shape and pose estimation over time［C］// Proceedings of the 2017 International Conference on 3D Vision. Piscataway： IEEE， 2017： 421-430. 10.1109/3dv.2017.00055
11	LASSNER C， ROMERO J， KIEFEL M， et al. Unite the people： closing the loop between 3D and 2D human representations［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 4704-4713. 10.1109/cvpr.2017.500
12	ZANFIR A， MARINOIU E， SMINCHISESCU C. Monocular 3D pose and shape estimation of multiple people in natural scenes —the importance of multiple scene constraints［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 2148-2157. 10.1109/cvpr.2018.00229
13	DIBRA E， JAIN H， ÖZTIRELI C， et al. HS-Nets： estimating human body shape from silhouettes with convolutional neural networks［C］// Proceedings of the 4th International Conference on 3D Vision. Piscataway： IEEE， 2016： 108-117. 10.1109/3dv.2016.19
14	TAN J K V， BUDVYTIS I， CIPOLLA R. Indirect deep structured learning for 3D human body shape and pose prediction［C］// Proceedings of the 2017 British Machine Vision Conference Durham： BMVA Press， 2017： No.722. 10.5244/c.31.15
15	TUNG H Y F， TUNG H W， YUMER E， et al. Self-supervised learning of motion capture［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2017：5242-5252.
16	KANAZAWA A， BLACK M J， JACOBS D W， et al. End-to-end recovery of human shape and pose［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 7122-7131. 10.1109/cvpr.2018.00744
17	KOLOTOUROS N， PAVLAKOS G， BLACK M J， et al. Learning to reconstruct 3D human pose and shape via model-fitting in the loop［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2019： 2252-2261. 10.1109/iccv.2019.00234
18	ZHANG T S， HUANG B Z， WANG Y G. Object-occluded human shape and pose estimation from a single color image［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 7374-7383. 10.1109/cvpr42600.2020.00740
19	LI Z G， OSKARSSON M， HEYDEN A. 3D human pose and shape estimation through collaborative learning and multi-view model-fitting［C］// Proceedings of the 2021 IEEE/CVF Winter Conference on Applications of Computer Vision. Piscataway： IEEE， 2021： 1887-1896. 10.1109/wacv48630.2021.00193
20	KOLOTOUROS N， PAVLAKOS G， DANIILIDIS K. Convolutional mesh regression for single-image human shape reconstruction［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 4496-4505. 10.1109/cvpr.2019.00463
21	XIE H Y， ZHONG Y Q， YU Z C， et al. Non-parametric anthropometric graph convolutional network for virtual mannequin reconstruction［J］. IEEE Access， 2020， 8： 3539-3550. 10.1109/access.2019.2962833
22	ZHANG S Z， XIAO N F. Detailed 3D human body reconstruction from a single image based on mesh deformation［J］. IEEE Access， 2021， 9： 8595-8603. 10.1109/access.2021.3049548
23	CHENG K L， TONG R F， TANG M， et al. Parametric human body reconstruction based on sparse key points［J］. IEEE Transactions on Visualization and Computer Graphics， 2016， 22（11）： 2467-2479. 10.1109/tvcg.2015.2511751
24	BOGO F， ROMERO J， LOPER M， et al. FAUST： dataset and evaluation for 3D mesh registration［C］// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2014： 3794-3801. 10.1109/cvpr.2014.491
25	von MARCARD T， HENSCHEL R， BLACK M J， et al. Recovering accurate 3D human pose in the wild using IMUs and a moving camera［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11214. Cham： Springer， 2018： 614-631.
26	LIN T Y， MAIRE M， BELONGIE S， et al. Microsoft COCO： common objects in context［C］// Proceedings of the 2014 European Conference on Computer Vision， LNCS 8693. Cham： Springer， 2014： 740-755.
27	ANDRILUKA M， PISHCHULIN L， GEHLER P， et al. 2D human pose estimation： new benchmark and state of the art analysis［C］// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2014： 3686-3693. 10.1109/cvpr.2014.471
28	JOHNSON S， EVERINGHAM M. Clustered pose and nonlinear appearance models for human pose estimation［C］// Proceedings of the 2010 British Machine Vision Conference. Durham： BMVA Press， 2010： No.12. 10.5244/c.24.12
29	MEHTA D， RHODIN H， CASAS D， et al. Monocular 3D human pose estimation in the wild using improved CNN supervision［C］// Proceedings of the 2017 International Conference on 3D Vision. Piscataway： IEEE， 2017： 506-516. 10.1109/3dv.2017.00064

模型	F1	准确率	模型	F1	准确率
SMPLify	84.90	90.56	CMR	87.10	91.55
HMR	86.95	91.02	本文模型	88.03	92.41

模型	F1	准确率	模型	F1	准确率
SMPLify	84.90	90.56	CMR	87.10	91.55
HMR	86.95	91.02	本文模型	88.03	92.41

视频帧	SMPLify	HMR	CMR	本文
平均	943.57	235.73	181.80	97.73
TS1	844.13	187.09	145.70	63.36
TS2	897.08	283.63	172.57	89.76
TS3	1 059.01	251.29	160.07	91.96
TS4	974.92	265.72	233.98	106.48
TS5	856.23	172.19	208.39	116.07
TS6	1 030.02	254.45	170.09	118.73

视频帧	SMPLify	HMR	CMR	本文
平均	943.57	235.73	181.80	97.73
TS1	844.13	187.09	145.70	63.36
TS2	897.08	283.63	172.57	89.76
TS3	1 059.01	251.29	160.07	91.96
TS4	974.92	265.72	233.98	106.48
TS5	856.23	172.19	208.39	116.07
TS6	1 030.02	254.45	170.09	118.73

视频帧	SMPLify	HMR	CMR	本文
平均	138.85	130.63	97.38	64.63
TS1	171.14	102.07	75.29	41.72
TS2	145.51	132.44	112.70	60.29
TS3	123.27	142.19	91.94	58.60
TS4	135.35	152.72	110.51	66.00
TS5	138.76	108.19	85.66	73.86
TS6	119.09	146.15	108.15	87.31