《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (8): 2415-2422.DOI: 10.11772/j.issn.1001-9081.2021060996

• 人工智能 • 上一篇    下一篇

多视角约束级联回归的视频人脸特征点跟踪

代少升, 熊昆(), 吴云铎, 肖佳伟   

  1. 重庆邮电大学 通信与信息工程学院,重庆 400065
  • 收稿日期:2021-06-11 修回日期:2021-09-27 接受日期:2021-10-15 发布日期:2022-01-25 出版日期:2022-08-10
  • 通讯作者: 熊昆
  • 作者简介:代少升(1974—),男,河南潢川人,教授,博士,主要研究方向:图像处理、深度学习、红外成像;
    熊昆(1995—),男,四川宜宾人,硕士研究生,主要研究方向:图像处理、目标检测、深度学习;
    吴云铎(1996—),男,福建莆田人,硕士研究生,主要研究方向:目标检测、深度学习;
    肖佳伟(1998—),男,湖北天门人,硕士研究生,主要研究方向:目标检测、图像处理。

Video facial landmark tracking by multi-view constrained cascade regression

Shaosheng DAI, Kun XIONG(), Yunduo WU, Jiawei XIAO   

  1. School of Communications and Information Engineering,Chongqing University of Posts and Telecommunications,Chongqing 400065,China
  • Received:2021-06-11 Revised:2021-09-27 Accepted:2021-10-15 Online:2022-01-25 Published:2022-08-10
  • Contact: Kun XIONG
  • About author:DAI Shaosheng, born in 1974, Ph. D., professor. His research interests include image processing, deep learning, infrared imaging.
    XIONG Kun, born in 1995, M. S. candidate. His research interests include image processing, target detection, deep learning.
    WU Yunduo, born in 1996, M. S. candidate. His research interests include target detection, deep learning.
    XIAO Jiawei, born in 1998, M. S. candidate. His research interests include target detection, image processing.

摘要:

近年来,静态图像中人脸特征点检测算法得到了极大的改进,然而,由于真实视频中头部姿态、遮挡和光照等因素的变化,人脸特征点检测和跟踪仍然具有挑战性。为了解决这一问题,提出一种多视角约束级联回归的视频人脸特征点跟踪算法。首先,利用三维和二维稀疏点集建立变换关系,并估计初始形状;其次,由于人脸图像存在较大的姿态差异,使用仿射变换对人脸图像进行姿态矫正;在构造形状回归模型时,采用多视角约束级联回归模型减小形状方差,从而使学习到的回归模型对形状方差具有更强的鲁棒性;最后,采用重新初始化机制,并在特征点正确定位时使用归一化互相关(NCC)模板匹配跟踪算法建立连续帧之间的形状关系。在公共数据集上的实验结果表明:该算法的平均误差小于眼间距离的10%。

关键词: 人脸特征点跟踪, 多视角约束, 仿射变换, 级联回归, 归一化互相关

Abstract:

In recent years, the algorithms of detecting facial landmarks in static images have been greatly improved. However, facial landmark detection and tracking are still challenging due to the changes of the factors such as head posture, occlusion and illumination in real videos. In order to solve this problem, a video facial landmark tracking algorithm based on multi-view constrained cascade regression was proposed. Firstly, the 3-dimensional and 2-dimensional sparse point sets were used to establish a transformation relationship and estimate the initial shape. Secondly, due to the large posture difference between face images, affine transformation was used to correct the pose of the face images. When the shape regression model was constructed, the multi-view constrained cascade regression model was used to reduce the shape variance, so that the learned regression model had stronger robustness to the shape variance. Finally, a reinitialization mechanism was adopted, and Normalized Cross Correlation (NCC) template matching tracking algorithm was used to establish the shape relationship between consecutive frames when the feature points were correctly located. The experimental results on the public data set used for testing show that the average error of the proposed algorithm is less than 10% of the interocular distance.

Key words: facial landmark tracking, multi-view constraint, affine transformation, cascade regression, Normalized Cross Correlation (NCC)

中图分类号: