计算机应用 ›› 2017, Vol. 37 ›› Issue (9): 2694-2699.DOI: 10.11772/j.issn.1001-9081.2017.09.2694

• 应用前沿、交叉与综合 • 上一篇    下一篇

基于SIFT的说话人唇动识别

马新军, 吴晨晨, 仲乾元, 李园园   

  1. 哈尔滨工业大学(深圳) 机电工程与自动化学院, 广东 深圳 518055
  • 收稿日期:2017-03-09 修回日期:2017-05-24 出版日期:2017-09-10 发布日期:2017-09-13
  • 通讯作者: 吴晨晨,870715761@qq.com
  • 作者简介:马新军(1972-),男,新疆石河子人,副教授,博士,主要研究方向:图像处理及模式识别、智能汽车及智能驾驶、生物识别;吴晨晨(1993-),女,河南濮阳人,硕士研究生,主要研究方向:模式识别;仲乾元(1990-),男,江苏徐州人,硕士研究生,主要研究方向:模式识别;李园园(1993-),女,河南许昌人,硕士研究生,主要研究方向:模式识别。
  • 基金资助:
    国家自然科学基金资助项目(51677035);深圳市基础研究项目(JCYJ20150513151706580);深圳市科技计划项目(GRCK2016082611021550)。

Lip motion recognition of speaker based on SIFT

MA Xinjun, WU Chenchen, ZHONG Qianyuan, LI Yuanyuan   

  1. College of Mechanical Engineering and Automation, Harbin Institute of Technology(Shenzhen), Shenzhen Guangdong 518055, China
  • Received:2017-03-09 Revised:2017-05-24 Online:2017-09-10 Published:2017-09-13
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (51677035), the Fundamental Research Project of Shenzhen (JCYJ20150513151706580), the Science and Technology Plan Project of Shenzhen (GRCK2016082611021550).

摘要: 针对唇部特征提取维度过高以及对尺度空间敏感的问题,提出了一种基于尺度不变特征变换(SIFT)算法作特征提取来进行说话人身份认证的技术。首先,提出了一种简单的视频帧图片规整算法,将不同长度的唇动视频规整到同一的长度,提取出具有代表性的唇动图片;然后,提出一种在SIFT关键点的基础上,进行纹理和运动特征的提取算法,并经过主成分分析(PCA)算法的整合,最终得到具有代表性的唇动特征进行认证;最后,根据所得到的特征,提出了一种简单的分类算法。实验结果显示,和常见的局部二元模式(LBP)特征和方向梯度直方图(HOG)特征相比较,该特征提取算法的错误接受率(FAR)和错误拒绝率(FRR)表现更佳。说明整个说话人唇动特征识别算法是有效的,能够得到较为理想的结果。

关键词: 唇部特征, 尺度不变特征变换, 特征提取, 说话人识别

Abstract: Aiming at the problem that the lip feature dimension is too high and sensitive to the scale space, a technique based on the Scale-Invariant Feature Transform (SIFT) algorithm was proposed to carry out the speaker authentication. Firstly, a simple video frame image neat algorithm was proposed to adjust the length of the lip video to the same length, and the representative lip motion pictures were extracted. Then, a new algorithm based on key points of SIFT was proposed to extract the texture and motion features. After the integration of Principal Component Analysis (PCA) algorithm, the typical lip motion features were obtained for authentication. Finally, a simple classification algorithm was presented according to the obtained features. The experimental results show that compared to the common Local Binary Pattern (LBP) feature and the Histogram of Oriental Gradient (HOG) feature, the False Acceptance Rate (FAR) and False Rejection Rate (FRR) of the proposed feature extraction algorithm are better, which proves that the whole speaker lip motion recognition algorithm is effective and can get the ideal results.

Key words: lip feature, Scale-Invariant Feature Transform (SIFT), feature extraction, speaker authentication

中图分类号: