计算机应用 ›› 2020, Vol. 40 ›› Issue (12): 3658-3665.DOI: 10.11772/j.issn.1001-9081.2020050660

• 虚拟现实与多媒体计算 • 上一篇    下一篇

基于多模态特征融合的轻量级人脸活体检测方法

皮家甜1,2,3, 杨杰之1,2,3, 杨琳希2,3,4, 彭明杰1,2,3, 邓雄3,4, 赵立军3, 唐万梅1, 吴至友3,4   

  1. 1. 重庆师范大学 计算机与信息科学学院, 重庆 401331;
    2. 重庆市数字农业服务工程技术研究中心(重庆师范大学), 重庆 401331;
    3. 智慧金融与大数据分析重庆市重点实验室(重庆师范大学), 重庆 401331;
    4. 重庆师范大学 数学科学学院, 重庆 401331
  • 收稿日期:2020-05-18 修回日期:2020-07-13 出版日期:2020-12-10 发布日期:2020-08-14
  • 通讯作者: 唐万梅(1965-),女,重庆人,教授,博士,主要研究方向:神经网络、预测理论与方法。1093895431@qq.com
  • 作者简介:皮家甜(1990-),男,湖北潜江人,讲师,博士,主要研究方向:计算机视觉、仿生双眼、同步定位与地图构建;杨杰之(1994-),男,重庆人,硕士研究生,主要研究方向:计算机视觉、深度学习、模式识别;杨琳希(1996-),女,重庆人,硕士研究生,主要研究方向:机器学习、最优化问题算法、非线性规划;彭明杰(1993-),男,四川南充人,硕士研究生,主要研究方向:计算机视觉、深度学习、目标检测;邓雄(1991-),男,河南邓州人,硕士研究生,主要研究方向:统计机器学习、模式识别、不确定性推理;赵立军(1970-),男,辽宁朝阳人,博士,主要研究方向:人工智能;吴至友(1967-),女,重庆人,教授,博士,主要研究方向:非线性规划、最优化理论与算法、全局优化理论与算法、组合最优化
  • 基金资助:
    国家自然科学基金资助项目(11971083);重庆市教委科技项目青年项目(KJQN201800521);重庆市基础研究与前沿探索项目(cstc2018jcyjAX0470);重庆师范大学2019年研究生科研创新项目(YKC19014)。

Lightweight face liveness detection method based on multi-modal feature fusion

PI Jiatian1,2,3, YANG Jiezhi1,2,3, YANG Linxi2,3,4, PENG Mingjie1,2,3, DENG Xiong3,4, ZHAO Lijun3, TANG Wanmei1, WU Zhiyou3,4   

  1. 1. School of Computer and Information Science, Chongqing Normal University, Chongqing 401331, China;
    2. Chongqing Digital Agriculture Service Engineering Technology Research Center(Chongqing Normal University), Chongqing 401331, China;
    3. Chongqing Key Laboratory of Intelligent Finance and Big Data Analysis(Chongqing Normal University), Chongqing 401331, China;
    4. School of Mathematical Sciences, Chongqing Normal University, Chongqing 401331, China
  • Received:2020-05-18 Revised:2020-07-13 Online:2020-12-10 Published:2020-08-14
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (11971083), the Youth Project of Chongqing Municipal Education Commission Science and Technology Project (KJQN201800521), the Chongqing Basic Research and Frontier Exploration Project (cstc2018jcyjAX0470), the Chongqing Normal University 2019 Graduate Research and Innovation Project (YKC19014).

摘要: 人脸活体检测是人脸识别过程中的一个重要环节,对于身份验证的安全性尤为重要。针对人脸识别过程存在照片、视频、面具、头套、头模等欺骗手段,通过Intel Realsense相机采集人脸RGB图和深度图信息,并在MobileNetV3的基础上提出了特征融合的轻量级活体检测网络,将深度图与RGB图的特征融合起来并且进行端到端的训练。而为了解决深度学习中参数量较大以及网络尾部对于权重区域的区分的问题,提出在网络尾部采用Streaming Module以减少网络参数量并且对权重区域进行区分。在CASIA-SURF数据集以及所制作的CQNU-LN数据集上进行仿真实验,结果表明所提方法在两个数据集上均于TPR@FPR=10E-4的级别上达到了95%的精度,相较对比方法中精度最高的ShuffleNet分别提高了0.1%和0.05%;在所制作的CQNU-3Dmask数据集上,所提方法于TPR@FPR=10E-4的级别达到了95.2%的精度,比仅训练RGB图或仅训练深度图的方法分别提升了0.9%和6.5%,并且,模型的参数文件的大小仅为1.8 MB,每秒浮点数运算量(FLOPs)仅为1.5×106。该方法能够在实际应用中对提取到的人脸进行准确的实时检测。

关键词: 计算机视觉, 卷积神经网络, 人脸活体检测, 多模态特征融合, 轻量级网络

Abstract: Face liveness detection is an important part of the face recognition process, and is particularly important for the security of identity verification. In view of the cheating methods such as photo, video, mask, hood and head model in the face recognition process, the RGB map and depth map information of the face was collected by the Intel Realsense camera, and a lightweight liveness detection of feature fusion was proposed based on MobileNetV3 to fuse the features of the depth map and the RGB map together and perform the end-to-end training. To solve the problem of large parameter quantity in deep learning and the distinction of the weight areas by the network tail, the method of using Streaming Module at the network tail was proposed to reduce the quantity of network parameters and distinguish weight regions. Simulation experiments were performed on CASIA-SURF dataset and the constructed CQNU-LN dataset. The results show that, on both datasets, the proposed method achieves an accuracy of 95% with TPR@FPR=10E-4, which is increased by 0.1% and 0.05% respectively compared to ShuffleNet with the highest accuracy in the comparison methods. The accuracy of the proposed method reaches an accuracy of 95.2% with TPR@FPR=10E-4 on the constructed CQNU-3Dmask dataset, which is improved by 0.9% and 6.5% respectively compared to those of the method training RGB maps only and the method training depth maps only. In addition, the proposed model has the parameter quantity of only 1.8 MB and FLoating-point Operations Per second (FLOPs) of only 1.5×106. The proposed method can perform accurate and real-time liveness detection on the extracted face target in practical applications.

Key words: computer vision, convolutional neural network, face liveness detection, multi-modal feature fusion, lightweight network

中图分类号: