多视角约束级联回归的视频人脸特征点跟踪

doi:10.11772/j.issn.1001-9081.2021060996

《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (8): 2415-2422.DOI: 10.11772/j.issn.1001-9081.2021060996

所属专题：人工智能

多视角约束级联回归的视频人脸特征点跟踪

代少升, 熊昆(), 吴云铎, 肖佳伟

重庆邮电大学通信与信息工程学院，重庆 400065

收稿日期:2021-06-11 修回日期:2021-09-27 接受日期:2021-10-15 发布日期:2022-01-25 出版日期:2022-08-10
通讯作者: 熊昆
作者简介:代少升（1974—），男，河南潢川人，教授，博士，主要研究方向：图像处理、深度学习、红外成像；
熊昆（1995—），男，四川宜宾人，硕士研究生，主要研究方向：图像处理、目标检测、深度学习；
吴云铎（1996—），男，福建莆田人，硕士研究生，主要研究方向：目标检测、深度学习；
肖佳伟（1998—），男，湖北天门人，硕士研究生，主要研究方向：目标检测、图像处理。

Video facial landmark tracking by multi-view constrained cascade regression

Shaosheng DAI, Kun XIONG(), Yunduo WU, Jiawei XIAO

School of Communications and Information Engineering，Chongqing University of Posts and Telecommunications，Chongqing 400065，China

Received:2021-06-11 Revised:2021-09-27 Accepted:2021-10-15 Online:2022-01-25 Published:2022-08-10
Contact: Kun XIONG
About author:DAI Shaosheng， born in 1974， Ph. D.， professor. His research interests include image processing， deep learning， infrared imaging.
XIONG Kun， born in 1995， M. S. candidate. His research interests include image processing， target detection， deep learning.
WU Yunduo， born in 1996， M. S. candidate. His research interests include target detection， deep learning.
XIAO Jiawei， born in 1998， M. S. candidate. His research interests include target detection， image processing.

摘要/Abstract

摘要：

近年来，静态图像中人脸特征点检测算法得到了极大的改进，然而，由于真实视频中头部姿态、遮挡和光照等因素的变化，人脸特征点检测和跟踪仍然具有挑战性。为了解决这一问题，提出一种多视角约束级联回归的视频人脸特征点跟踪算法。首先，利用三维和二维稀疏点集建立变换关系，并估计初始形状；其次，由于人脸图像存在较大的姿态差异，使用仿射变换对人脸图像进行姿态矫正；在构造形状回归模型时，采用多视角约束级联回归模型减小形状方差，从而使学习到的回归模型对形状方差具有更强的鲁棒性；最后，采用重新初始化机制，并在特征点正确定位时使用归一化互相关（NCC）模板匹配跟踪算法建立连续帧之间的形状关系。在公共数据集上的实验结果表明：该算法的平均误差小于眼间距离的10%。

关键词: 人脸特征点跟踪, 多视角约束, 仿射变换, 级联回归, 归一化互相关

Abstract:

In recent years， the algorithms of detecting facial landmarks in static images have been greatly improved. However， facial landmark detection and tracking are still challenging due to the changes of the factors such as head posture， occlusion and illumination in real videos. In order to solve this problem， a video facial landmark tracking algorithm based on multi-view constrained cascade regression was proposed. Firstly， the 3-dimensional and 2-dimensional sparse point sets were used to establish a transformation relationship and estimate the initial shape. Secondly， due to the large posture difference between face images， affine transformation was used to correct the pose of the face images. When the shape regression model was constructed， the multi-view constrained cascade regression model was used to reduce the shape variance， so that the learned regression model had stronger robustness to the shape variance. Finally， a reinitialization mechanism was adopted， and Normalized Cross Correlation （NCC） template matching tracking algorithm was used to establish the shape relationship between consecutive frames when the feature points were correctly located. The experimental results on the public data set used for testing show that the average error of the proposed algorithm is less than 10% of the interocular distance.

Key words: facial landmark tracking, multi-view constraint, affine transformation, cascade regression, Normalized Cross Correlation (NCC)

中图分类号:

TP391

代少升, 熊昆, 吴云铎, 肖佳伟. 多视角约束级联回归的视频人脸特征点跟踪[J]. 计算机应用, 2022, 42(8): 2415-2422.

Shaosheng DAI, Kun XIONG, Yunduo WU, Jiawei XIAO. Video facial landmark tracking by multi-view constrained cascade regression[J]. Journal of Computer Applications, 2022, 42(8): 2415-2422.

图/表 17

图1 视频人脸特征点跟踪基本流程

Fig. 1 Basic process of video face feature point tracking

图2 不同投影映射关系的变换结果

Fig. 2 Transformation results of different projection mapping relations

图3 用5点估计66点

Fig. 3 Estimateing 66 points by using 5 points

图4 特征点的错乱重叠

Fig. 4 Disorder and overlapping of feature points

图5 不同增量倍数的形状变化

Fig. 5 Shape change of different increment multiples

图6 24种不同增量的形状

Fig. 6 24 shapes of different increments

图7 调整前后的形状

Fig. 7 Shapes before and after adjusting

图8 仿射变换结果

Fig. 8 Results of affine transformation

图9 NCC模板匹配跟踪流程

Fig. 9 NCC template matching tracking process

图10 NCC模板匹配跟踪结果

Fig. 10 NCC template matching tracking results

表1 本文算法与其他算法在平均误差上的比较

Tab. 1 Comparison of average error between the proposed algorithm and other algorithms

算法	Lfpw	Helen	300W
算法	Lfpw	Helen	普通	挑战	全部
ESR	—	—	5.28	17.00	7.58
SDM	5.67	5.50	5.57	15.40	7.50
ERT	—	—	—	—	6.40
LBF	—	—	4.95	11.98	6.32
GN-DPM	5.92	5.69	5.78	—	—
CFSS	—	—	4.73	9.98	5.76
3DDFA	—	—	6.15	10.59	7.01
MDM	—	—	4.83	10.14	5.88
SeqMT	—	—	4.84	9.93	5.74
PFLD	—	—	3.38	6.83	4.02
LUVLi	—	—	2.76	5.16	3.23
MCCR	5.51	5.31	5.39	9.72	6.24

图11 MCCR的示例图像

Fig. 11 Example images of MCCR

图12 300VW数据集中重新初始化实验结果

Fig. 12 Experimental results of reinitialization in 300VW dataset

图13 累积误差曲线和前400帧归一化误差

Fig. 13 Cumulative error curves and the first 400 frames normalized errors

图14 300VW数据集上的累积误差曲线

Fig. 14 Cumulative error curves on 300VW dataset

表2 不同处理平台上模型大小和处理时间对比

Tab. 2 Comparison of model size and processing time on different processing platforms

算法	处理器/显卡	模型大小/Mb	处理时间/ms
SDM	i7-6700K	10.1	16
LAB	TITAN X	50.7	60
SAN	GTX 1080Ti	270.5+528.0	343
MCCR	GTX 1080Ti	7.6	21

图15 300VW数据集结果示例

Fig. 15 Examples of 300VW dataset results

参考文献 22

1	LI C Y， BALTRUŠAITIS T， MORENCY L P. Constrained ensemble initialization for facial landmark tracking in video ［C］// Proceedings of the 12th IEEE International Conference on Automatic Face and Gesture Recognition. Piscataway： IEEE， 2017： 697-704. 10.1109/fg.2017.88
2	LIU S， ZHANG Y Q， YANG X S， et al. Robust facial landmark detection and tracking across poses and expressions for in-the-wild monocular video［J］. Computational Visual Media， 2017， 3（1）： 33-47. 10.1007/s41095-016-0068-y
3	ASTHANA A， ZAFEIRIOU S， CHENG S Y， et al. Robust discriminative response map fitting with constrained local models ［C］// Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2013： 3444-3451. 10.1109/cvpr.2013.442
4	BALTRUŠAITIS T， ROBINSON P， MORENCY L P. Continuous conditional neural fields for structured regression ［C］// Proceedings of the 2014 European Conference on Computer Vision， LNCS 8692. Cham： Springer， 2014： 593-608.
5	RAJAMANOHARAN G， COOTES T F. Multi-view constrained local models for large head angle facial tracking ［C］// Proceedings of the 2015 IEEE International Conference on Computer Vision Workshop. Piscataway： IEEE， 2015： 971-978. 10.1109/iccvw.2015.128
6	桑高丽，王国滨，朱蓉，等.基于级联形状回归的多视角人脸特征点定位［J］.浙江大学学报（工学版）， 2019， 53（7）： 1374-1379， 1388.
	SANG G L， WANG G B， ZHU R， et al. Multi-view face feature point localization based on cascade shape regression［J］. Journal of Zhejiang University （Engineering Science）， 2019， 53（7）： 1374-1379， 1388.
7	赵慧，景丽萍，于剑.自适应监督下降方法的姿态鲁棒人脸对齐算法［J］.计算机科学与探索， 2020， 14（4）： 649-656.
	ZHAO H， JING L P， YU J. Pose robust face alignment algorithm for adaptive supervised descent method ［J］. Journal Frontiers of Computer Science and Technology， 2020， 14（4）： 649-656.
8	URICÁR M， FRANC V， HLAVÁC V. Facial landmark tracking by tree-based deformable part model based detector ［C］// Proceedings of the 2015 IEEE International Conference on Computer Vision Workshop. Piscataway： IEEE， 2015： 963-970. 10.1109/iccvw.2015.127
9	DONG X Y， YAN Y， OUYANG W L， et al. Style aggregated network for facial landmark detection ［C］// Proceedings of the 2018 IEEE /CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 379-388. 10.1109/cvpr.2018.00047
10	GUO X J， LI S Y， YU J K， et al. PFLD： a practical facial landmark detector［EB/OL］. （2019-03-03）［2021-03-21］. .
11	KUMAR A， MARKS T K， MOU W X， et al. LUVLi face alignment： estimating landmarks' location， uncertainty， and visibility likelihood ［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 8233-8243. 10.1109/cvpr42600.2020.00826
12	ZHANG K P， ZHANG Z P， LI Z F， et al. Joint face detection and alignment using multitask cascaded convolutional networks［J］. IEEE Signal Processing Letters， 2016， 23（10）： 1499-1503. 10.1109/lsp.2016.2603342
13	XIONG X H， DE LA TORRE F. Supervised descent method and its applications to face alignment ［C］// Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2013： 532-539. 10.1109/cvpr.2013.75
14	TZIMIROPOULOS G， PANTIC M. Gauss-Newton deformable part models for face alignment in-the-wild ［C］// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2014： 1851-1858. 10.1109/cvpr.2014.239
15	KAZEMI V， SULLIVAN J. One millisecond face alignment with an ensemble of regression trees ［C］// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2014： 1867-1874. 10.1109/cvpr.2014.241
16	CAO X D， WEI Y C， WEN F， et al. Face alignment by explicit shape regression［J］. International Journal of Computer Vision， 2014， 107（2）： 177-190. 10.1007/s11263-013-0667-3
17	REN S Q， CAO X D， WEI Y C， et al. Face alignment at 3000 FPS via regressing local binary features ［C］// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2014： 1685-1692. 10.1109/cvpr.2014.218
18	ZHU S Z， LI C， LOY C C， et al. Face alignment by coarse-to-fine shape searching ［C］// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2015： 4998-5006. 10.1109/cvpr.2015.7299134
19	ZHU X Y， LEI Z， LIU X M， et al. Face alignment across large poses： a 3D solution ［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 146-155. 10.1109/cvpr.2016.23
20	TRIGEORGIS G， SNAPE P， NICOLAOU M A， et al. Mnemonic descent method： a recurrent process applied for end-to-end face alignment ［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 4177-4187. 10.1109/cvpr.2016.453
21	HONARI S， MOLCHANOV P， TYREE S， et al. Improving landmark localization with semi-supervised learning ［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 1546-1555. 10.1109/cvpr.2018.00167
22	WU W Y， QIAN C， YANG S， et al. Look at boundary： a boundary-aware face alignment algorithm ［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 2129-2138. 10.1109/cvpr.2018.00227

[1]	吴孟桦, 胡晓兵, 李航, 江代渝. 基于图像配准的零件轮廓修正方法[J]. 计算机应用, 2020, 40(4): 1144-1150.
[2]	陈平, 龚勋. 基于尺度自适应与增量式学习的人脸对齐方法[J]. 计算机应用, 2018, 38(7): 2064-2069.
[3]	杨伟, 谢维成, 蒋文波, 石林玉. 基于自相似性车载采集城市街景图像的重建[J]. 计算机应用, 2017, 37(3): 817-822.
[4]	邹玮刚陈沛云黄江燕. 基于三维亚仿射变换的数字图像置乱技术[J]. 计算机应用, 2012, 32(09): 2595-2598.
[5]	刘毅飞张旭明丁明跃. 归一化互相关灰度图像匹配的多核信号处理器实现[J]. 计算机应用, 2011, 31(12): 3334-3336.
[6]	汪炜王伟濮运辰. 基于NCCSS的快速多波段图像配准算法[J]. 计算机应用, 2011, 31(01): 167-169.
[7]	王振魏志强. 交通标识牌字符提取算法[J]. 计算机应用, 2011, 31(01): 266-269.
[8]	唐斌兵陈团强王正明. 基于小波变换的图像配准方法[J]. 计算机应用, 2007, 27(9): 2103-2105.
[9]	温法慧赵卫东李吉超王志成. 基于小波域图像不变矩的图纸一致性检测[J]. 计算机应用, 2007, 27(8): 2077-2080.
[10]	朱政赵卫东王志成 . 改进的基于凸壳仿射不变量的图像识别和配准算法[J]. 计算机应用, 2007, 27(10): 2559-2562.
[11]	王晓冬霍宏方涛 . 基于快速归一化互相关函数的运动车辆阴影检测算法[J]. 计算机应用, 2006, 26(9): 2065-2067.
[12]	邱丽梅胡步发 . 基于仿射变换和线性回归的3D人脸姿态估计方法[J]. 计算机应用, 2006, 26(12): 2877-2879.

多视角约束级联回归的视频人脸特征点跟踪

Video facial landmark tracking by multi-view constrained cascade regression

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 17

参考文献 22

相关文章 12

编辑推荐

Metrics