基于改进三维形变模型的三维人脸重建和密集人脸对齐方法

doi:10.11772/j.issn.1001-9081.2020030420

摘要/Abstract

摘要： 针对现在广泛使用的三维形变模型表达能力不够，导致重建出的三维人脸模型泛化性能不佳的问题，提出了一种在姿态、表情和光照未知的条件下的基于单张人脸图片的三维人脸重建和密集人脸对齐的新方法。首先，通过卷积神经网络对现有的三维形变模型进行改进，以提高三维人脸模型的表达能力；然后，基于人脸光滑性和图像相似性，在特征点和像素层面提出新的损失函数，并使用弱监督学习训练卷积神经网络模型；最后，通过训练出的网络模型进行三维人脸重建和密集人脸对齐。实验结果表明，对于三维人脸重建任务，所提模型在AFLW2000-3D上实现了2.25的归一化平均误差；对于密集人脸对齐任务，所提模型在AFLW2000-3D和AFLW-LFPA上分别实现了3.80和3.34的归一化平均误差。与原始使用三维形变模型的方法相比，所提模型在三维人脸重建和密集人脸对齐上的归一化平均误差分别降低了7.4%和7.8%。针对不同光照环境以及角度的人脸图片，该网络模型的重建准确，鲁棒性好，且具有较高的三维人脸重建和密集人脸对齐质量。

关键词: 三维人脸重建, 密集人脸对齐, 三维形变模型, 弱监督学习, 卷积神经网络

Abstract: In order to solve the problem that the currently widely used 3D morphable model has insufficient expression ability, resulting in poor generalization performance of the reconstructed 3D face model, a novel method for 3D face reconstruction and dense face alignment based on a single face image under unknown pose, expression and illumination was proposed. First, the existing 3D morphable model was improved by convolutional neural network to improve the expression ability of the 3D face model. Then, based on the smoothness of the face and the similarity of the image, a new loss function was proposed at the feature point and pixel level, and the weakly-supervised learning was used to train the convolutional neural network model. Finally, the trained network model was used to perform the 3D face reconstruction and dense face alignment. Experimental results show that, for 3D face reconstruction, the proposed model has the normalized mean error on AFLW2000-3D reduced to 2.25, and for dense face alignment, the proposed model has the normalized mean errors on AFLW2000-3D and AFLW-LFPA reduced to 3.80 and 3.34 respectively. Compared with the original method using 3D morphable model, the proposed model has the normalized mean errors reduced by 7.4% and 7.8% respectively in 3D face reconstruction and dense face alignment. Therefore, for face images with different lighting environments and angles, this network model is accurate in reconstruction and robust, and has high 3D face reconstruction and dense face alignment quality.

Key words: 3D face reconstruction, dense face alignment, 3D morphable model, weakly-supervised learning, Convolutional Neural Network (CNN)

中图分类号:

TP391.41

周健, 黄章进. 基于改进三维形变模型的三维人脸重建和密集人脸对齐方法[J]. 计算机应用, 2020, 40(11): 3306-3313.

ZHOU Jian, HUANG Zhangjin. 3D face reconstruction and dense face alignment method based on improved 3D morphable model[J]. Journal of Computer Applications, 2020, 40(11): 3306-3313.

参考文献

[1] KEMELMACHER-SHLIZERMAN I, BASRI R. 3D face reconstruction from a single image using a single reference face shape[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2010,33(2):394-405.
[2] FYFFE G,JONES A,ALEXANDER O,et al. Driving highresolution facial scans with video performance capture[J]. ACM Transactions on Graphics,2014,34(1):No. 8.
[3] 邓秋平, 赵宇明. 基于单幅正面照片的三维人脸重建方法[J]. 计算机工程,2010,36(20):176-178.(DENG Q P,ZHAO Y M. 3D face reconstruction method based on single frontal photo[J]. Computer Engineering,2010,36(20):176-178.)
[4] ZHU X,LEI Z,LIU X,et al. Face alignment across large poses:a 3D solution[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2016:146-155.
[5] TEWARI A,ZOLLHÖFER M,KIM H,et al. MoFA:model-based deep convolutional face autoencoder for unsupervised monocular reconstruction[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway:IEEE, 2017:3735-3744.
[6] BLANZ V,VETTER T. A morphable model for the synthesis of 3D faces[C]//Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques. New York:ACM,1999:187-194.
[7] RICHARDSON E,SELA M,KIMMEL R. 3D face reconstruction by learning from synthetic data[C]//Proceedings of the 4th International Conference on 3D Vision. Piscataway:IEEE,2016:460-469.
[8] LIU F,ZHU R,ZENG D,et al. Disentangling features in 3D face shapes for joint face reconstruction and recognition[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2018:5216-5225.
[9] TRAN A T,HASSNER T,MASI I,et al. Regressing robust and discriminative 3D morphable models with a very deep neural network[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2017:5163-5172.
[10] TEWARI A, ZOLLHÖFER M, GARRIDO P, et al. Selfsupervised multi-level face model learning for monocular reconstruction at over 250 Hz[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2018:2549-2559.
[11] GENOVA K,COLE F,MASCHINOT A,et al. Unsupervised training for 3D morphable model regression[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2018:8377-8386.
[12] COOTES T F,EDWARDS G J,TAYLOR C J. Active appearance models[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2001,23(6):681-685.
[13] GERIG T,MOREL-FORSTER A,BLUMER C,et al. Morphable face models-an open framework[C]//Proceedings of the 13th IEEE International Conference on Automatic Face and Gesture Recognition. Piscataway:IEEE,2018:75-82.
[14] BOOTH J,ROUSSOS A,ZAFEIRIOU S,et al. A 3D morphable model learnt from 10000 faces[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2016:5543-5552.
[15] TRAN L,LIU X. On learning 3D face morphable model from inthe-wild images[EB/OL].[2020-04-02]. https://arxiv.org/pdf/1808.09560v1.pdf.
[16] ROTH J,TONG Y,LIU X. Adaptive 3D face reconstruction from unconstrained photo collections[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2016:4197-4206.
[17] JIN X, TAN X. Face alignment in-the-wild:a survey[J]. Computer Vision and Image Understanding,2017,162:1-22.
[18] WANG N,GAO X,TAO D,et al. Facial feature point detection:acomprehensive survey[J]. Neurocomputing,2018,275:50-65.
[19] ZHU X,YAN J,YI D,et al. Discriminative 3D morphable model fitting[C]//Proceedings of the 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition. Piscataway:IEEE,2015:1-8.
[20] JENI L A,COHN J F,KANADE T. Dense 3D face alignment from 2D videos in real-time[C]//Proceedings of the 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition. Piscataway:IEEE,2015:1-8.
[21] JACKSON A S,BULAT A,ARGYRIOU V,et al. Large pose 3D face reconstruction from a single image via direct volumetric CNN regression[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway:IEEE,2017:1031-1039.
[22] JIANG L,ZHANG J,DENG B,et al. 3D face reconstruction with geometry details from a single image[J]. IEEE Transactions on Image Processing,2018,27(10):4756-4770.
[23] GUO Y,ZHANG J CAI J,et al. CNN-based real-time dense face reconstruction with inverse-rendered photo-realistic face images[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2019,41(6):1294-1307.
[24] DENG Y,YANG J,XU S,et al. Accurate 3D face reconstruction with weakly-supervised learning:from single image to image set[EB/OL].[2020-04-02]. https://arxiv.org/pdf/1903.08527.pdf.
[25] RICHARDSON E,SELA M,OR-EL R,et al. Learning detailed face reconstruction from a single image[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2017:1259-1268.
[26] FENG Y,WU F,SHAO X,et al. Joint 3D face reconstruction and dense alignment with position map regression network[C]//Proceedings of the 2018 European Conference on Computer Vision,LNCS 11218. Cham:Springer,2018:557-574.
[27] SHI T,YUAN Y,FAN C,et al. Face-to-parameter translation for game character auto-creation[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway:IEEE,2019:161-170.
[28] CHANG F,TRAN A T,HASSNER T,et al. Deep,landmarkfree FAME:face alignment,modeling,and expression estimation[J]. International Journal of Computer Vision,2019,127(6/7):930-956.
[29] ASTHANA A, ZAFEIRIOU S, CHENG S, et al. Robust discriminative response map fitting with constrained local models[C]//Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2013:3444-3451.
[30] LIANG Z, DING S, LIN L. Unconstrained facial landmark localization with backbone-branches fully-convolutional networks[EB/OL].[2020-04-02]. https://arxiv.org/abs/1507.03409.
[31] PENG X,FERIS R S,WANG X,et al. A recurrent encoderdecoder network for sequential face alignment[C]//Proceedings of the 2016 European Conference on Computer Vision,LNCS 9905. Cham:Springer,2016:38-56.
[32] MCDONAGH J,TZIMIROPOULOS G. Joint face detection and alignment with a deformable Hough transform model[C]//Proceedings of the 2016 European Conference on Computer Vision,LNCS 9914. Cham:Springer,2016:569-580.
[33] GOU C,WU Y,WANG F,et al. Shape augmented regression for 3D face alignment[C]//Proceedings of the 2016 European Conference on Computer Vision,LNCS 9914. Cham:Springer, 2016:604-615.
[34] SÁNTA Z,KATO Z. 3D face alignment without correspondences[C]//Proceedings of the 2016 European Conference on Computer Vision,LNCS 9914. Cham:Springer,2016:521-535.
[35] DE BITTENCOURT ZAVAN F H,NASCIMENTO A C P,E SILVA L P,et al. 3D face alignment in the wild:a landmarkfree,nose-based approach[C]//Proceedings of the 2016 European Conference on Computer Vision,LNCS 9914. Cham:Springer, 2016:581-589.
[36] YU R, SAITO S, LI H, et al. Learning dense facial correspondences in unconstrained images[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway:IEEE,2017:4733-4742.
[37] BULAT A,TZIMIROPOULOS G. Two-stage convolutional part heatmap regression for the 1st 3D face alignment in the wild (3DFAW) challenge[C]//Proceedings of the 2016 European Conference on Computer Vision,LNCS 9914. Cham:Springer, 2016:616-624.
[38] BULAT A,TZIMIROPOULOS G. How far are we from solving the 2D & 3D face alignment problem?(and a dataset of 2300003D facial landmarks)[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway:IEEE, 2017:1021-1030.
[39] JOURABLOO A,LIU X. Pose-invariant 3D face alignment[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway:IEEE,2015:3694-3702.
[40] CAO C, HOU Q, ZHOU K. Displaced dynamic expression regression for real-time facial tracking and animation[J]. ACM Transactions on Graphics,2014,33(4):No. 43.
[41] LIU Y,JOURABLOO A,REN W,et al. Dense face alignment[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision Workshops. Piscataway:IEEE, 2017:1619-1628.
[42] GÜLER R A, TRIGEORGIS G, ANTONAKOS E, et al. DenseReG:fully convolutional dense shape regression in-the-wild[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2017:6799-6808.
[43] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL].[2020-04-02]. https://arxiv.org/pdf/1409.1556v1.pdf.
[44] CAO C,WENG Y,ZHOU S,et al. FaceWarehouse:a 3D facial expression database for visualcomputing[J]. IEEE Transactions on Visualization and Computer Graphics,2014,20(3):413-425.
[45] RAMAMOORTHI R, HANRAHAN P. A signal-processing framework for inverse rendering[C]//Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques. New York:ACM,2001:117-128.
[46] LIU Z,LUO P,WANG X,et al. Deep learning face attributes in the wild[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway:IEEE,2015:3730-3738.
[47] KINGMA D P, BA J L. Adam:a method for stochastic optimization[EB/OL].[2020-04-02]. https://arxiv.org/pdf/1412.6980.pdf.
[48] JOURABLOO A,LIU X. Large-pose face alignment via CNNbased dense 3D model fitting[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2016:4188-4196.
[49] BHAGAVATULA C,ZHU C,LUU K,et al. Faster than real-time facial alignment:a 3D spatial transformer network approach in unconstrained poses[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway:IEEE, 2017:4000-4009.