计算机应用 ›› 2018, Vol. 38 ›› Issue (6): 1648-1652.DOI: 10.11772/j.issn.1001-9081.2017112822

• 网络空间安全 • 上一篇    下一篇

基于修正倒谱特征的回放语音检测算法

林朗, 王让定, 严迪群, 李璨   

  1. 宁波大学 信息科学与工程学院, 浙江 宁波 315211
  • 收稿日期:2017-12-01 修回日期:2018-01-19 出版日期:2018-06-10 发布日期:2018-06-13
  • 通讯作者: 王让定
  • 作者简介:林朗(1994-),男,安徽阜阳人,硕士研究生,主要研究方向:多媒体信息安全;王让定(1962-),男,甘肃天水人,教授,博士,CCF会员,主要研究方向:多媒体信息安全、信息隐藏与隐写分析;严迪群(1979-),男,浙江余姚人,副教授,博士,CCF会员,主要研究方向:多媒体信息安全;李璨(1992-),女,安徽淮北人,硕士研究生,主要研究方向:多媒体信息安全。
  • 基金资助:
    国家自然科学基金资助项目(61672302,U1736215);浙江省自然科学基金资助项目(LZ15F020002,LY17F020010)。

Playback speech detection algorithm based on modified cepstrum feature

LIN Lang, WANG Rangding, YAN Diqun, LI Can   

  1. Faculty of Electrical Engineering and Computer Science, Ningbo University, Ningbo Zhejiang 315211, China
  • Received:2017-12-01 Revised:2018-01-19 Online:2018-06-10 Published:2018-06-13
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61672302, U1736215), the Natural Science Foundation of Zhejiang Province (LZ15F020002, LY17F020010).

摘要: 随着语音技术的发展,以回放语音为代表的各种仿冒语音给声纹认证系统及音频取证技术带来了极大挑战。针对回放语音对声纹认证系统的攻击问题,提出一种基于修正倒谱特征的检测算法。首先,采用变异系数来分析原始语音和回放语音在频域上的差异;然后,有针对性地将提取梅尔倒谱系数(MFCC)过程中的Mel滤波器组换成由linear滤波器和逆Mel滤波器组合的新滤波器组,进而得到基于新滤波器组的修正倒谱特征;最后,使用高斯混合模型(GMM)作为分类器进行分类判别。实验结果表明,修正的倒谱特征能够有效地检测回放语音,其等错误率约为3.45%。

关键词: 变异系数, 高斯混合模型, 回放语音检测, 梅尔倒谱系数, 滤波器组

Abstract: With the development of speech technology, various kinds of phishing speech represented by playback speech have brought serious challenge for voiceprint authentication system and audio forensics technology. Aiming at the attack problem of playback speech to voiceprint authentication system, a new detection algorithm based on modified cepstrum feature was proposed. Firstly, the coefficient of variation was used to analyze the difference between the original speech and the playback speech in the frequency domain. Secondly, a new filter bank composed of inverse-Mel filters and linear filters was used to replace Mel filter bank in the process of extracting Mel Frequency Cepstral Coefficients (MFCC) pertinently, and then the modified cepstrum feature based on the new filter bank was obtained. Finally, Gaussian Mixture Model (GMM) was utilized as the classifier to classify and discriminate speech. The experimental results show that, the modified cepstrum feature can effectively detect the playback speech, and its equal error rate is about 3.45%.

Key words: coefficient of variation, Gaussian Mixture Model (GMM), playback speech detection, Mel Frequency Cepstral Coefficients (MFCC), filter bank

中图分类号: