基于修正倒谱特征的回放语音检测算法

doi:10.11772/j.issn.1001-9081.2017112822

计算机应用 ›› 2018, Vol. 38 ›› Issue (6): 1648-1652.DOI: 10.11772/j.issn.1001-9081.2017112822

基于修正倒谱特征的回放语音检测算法

林朗, 王让定, 严迪群, 李璨

宁波大学信息科学与工程学院, 浙江宁波 315211

收稿日期:2017-12-01 修回日期:2018-01-19 出版日期:2018-06-10 发布日期:2018-06-13
通讯作者: 王让定
作者简介:林朗(1994-),男,安徽阜阳人,硕士研究生,主要研究方向:多媒体信息安全;王让定(1962-),男,甘肃天水人,教授,博士,CCF会员,主要研究方向:多媒体信息安全、信息隐藏与隐写分析;严迪群(1979-),男,浙江余姚人,副教授,博士,CCF会员,主要研究方向:多媒体信息安全;李璨(1992-),女,安徽淮北人,硕士研究生,主要研究方向:多媒体信息安全。
基金资助:
国家自然科学基金资助项目（61672302，U1736215）；浙江省自然科学基金资助项目（LZ15F020002，LY17F020010）。

Playback speech detection algorithm based on modified cepstrum feature

LIN Lang, WANG Rangding, YAN Diqun, LI Can

Faculty of Electrical Engineering and Computer Science, Ningbo University, Ningbo Zhejiang 315211, China

Received:2017-12-01 Revised:2018-01-19 Online:2018-06-10 Published:2018-06-13
Supported by:
This work is partially supported by the National Natural Science Foundation of China (61672302, U1736215), the Natural Science Foundation of Zhejiang Province (LZ15F020002, LY17F020010).

摘要/Abstract

摘要： 随着语音技术的发展，以回放语音为代表的各种仿冒语音给声纹认证系统及音频取证技术带来了极大挑战。针对回放语音对声纹认证系统的攻击问题，提出一种基于修正倒谱特征的检测算法。首先，采用变异系数来分析原始语音和回放语音在频域上的差异；然后，有针对性地将提取梅尔倒谱系数（MFCC）过程中的Mel滤波器组换成由linear滤波器和逆Mel滤波器组合的新滤波器组，进而得到基于新滤波器组的修正倒谱特征；最后，使用高斯混合模型（GMM）作为分类器进行分类判别。实验结果表明，修正的倒谱特征能够有效地检测回放语音，其等错误率约为3.45%。

关键词: 变异系数, 高斯混合模型, 回放语音检测, 梅尔倒谱系数, 滤波器组

Abstract: With the development of speech technology, various kinds of phishing speech represented by playback speech have brought serious challenge for voiceprint authentication system and audio forensics technology. Aiming at the attack problem of playback speech to voiceprint authentication system, a new detection algorithm based on modified cepstrum feature was proposed. Firstly, the coefficient of variation was used to analyze the difference between the original speech and the playback speech in the frequency domain. Secondly, a new filter bank composed of inverse-Mel filters and linear filters was used to replace Mel filter bank in the process of extracting Mel Frequency Cepstral Coefficients (MFCC) pertinently, and then the modified cepstrum feature based on the new filter bank was obtained. Finally, Gaussian Mixture Model (GMM) was utilized as the classifier to classify and discriminate speech. The experimental results show that, the modified cepstrum feature can effectively detect the playback speech, and its equal error rate is about 3.45%.

Key words: coefficient of variation, Gaussian Mixture Model (GMM), playback speech detection, Mel Frequency Cepstral Coefficients (MFCC), filter bank

中图分类号:

TN912.3

林朗, 王让定, 严迪群, 李璨. 基于修正倒谱特征的回放语音检测算法[J]. 计算机应用, 2018, 38(6): 1648-1652.

LIN Lang, WANG Rangding, YAN Diqun, LI Can. Playback speech detection algorithm based on modified cepstrum feature[J]. Journal of Computer Applications, 2018, 38(6): 1648-1652.

参考文献

[1] ZHU D L, MA B, LI H Z. Speaker verification with feature-space MAPLR parameters[J]. IEEE Transactions on Audio Speech and Language Processing, 2011, 19(3):505-515.
[2] WU Z Z, KINNUNEN T, EVANS N, et al. ASVspoof 2015:the first automatic speaker verification spoofing and countermeasures challenge[EB/OL].[2017-10-16]. http://www.zhizheng.org/papers/is2015_asvspoof.pdf.
[3] ALEGRE F, JANICKI A, EVANS N. Re-assessing the threat of replay spoofing attacks against automatic speaker verification[C]//Proceedings of the 2014 International Conference of the Biometrics Special Interest Group. Piscataway, NJ:IEEE, 2014:157-168.
[4] 张利鹏,曹犟,徐明星,等.防止假冒者闯入说话人识别系统[J].清华大学学报(自然科学版),2008,48(S1):699-703.(ZHANG L P, CAO J, XU M X, et al. Prevention of impostors entering speaker recognition systems[J]. Journal of Tsinghua University (Science and Technology), 2008, 48(S1):699-703.)
[5] 王志峰,贺前华,张雪源,等.基于信道模式噪声的录音回放攻击检测[J].华南理工大学学报(自然科学版),2011,39(10):7-12.(WANG Z F, HE Q H, ZHANG X Y, et al. Playback attack detection based on channel pattern noise[J]. Journal of South China University of Technology (Natural Science Edition), 2011, 39(10):7-12.)
[6] SHANG W, STEVENSON M. A playback attack detector for speaker verification systems [C]// Proceedings of the 20083rd International Symposium on Communications, Control and Signal Processing. Piscataway, NJ: IEEE, 2008: 1144-1149.
[7] GAKA J, GRZYWACZ M, SAMBORSKI R. Playback attack detection for text-dependent speaker verification over telephone channels [J]. Speech Communication, 2015, 67: 143-153.
[8] TODISCO M, DELGADO H, EVANS N. A new feature for automatic speaker verification anti-spoofing: constant Q cepstral coefficients [EB/OL]. [2017-10-16]. https://www.isca-speech.org/archive/Odyssey_2016/pdfs/59.pdf。
[9] BURILLO P, BUSTINCE H. Entropy on intuitionistic fuzzy sets and on interval-valued fuzzy sets [J]. Fuzzy Sets and Systems, 1996, 78(3): 305-316.
[10] 项要杰,杨俊安,李晋徽,等.一种适用于说话人识别的改进Mel滤波器[J].计算机工程,2013,39(11):214-217,222.(XIANG Y J, YANG J A, LI J H, et al. An improved Mel-frequency filter for speaker recognition [J]. Computer Engineering, 2013, 39(11): 214-217, 222.)
[11] WU Z Z, YAMAGISHI J, KINNUNEN T, et al. ASVspoof: the automatic speaker verification spoofing and countermeasures challenge [J]. IEEE Journal of Selected Topics in Signal Processing, 2017, 11(4): 588-604.
[12] EVANS N W D, KINNUNEN T, YAMAGISHI J. Spoofing and countermeasures for automatic speaker verification [EB/OL]. [2017-10-16]. http://www.cstr.inf.ed.ac.uk/downloads/publications/2013/mm-publi-4018.pdf.
[13] LEE K A, LARCHER A, WANG G S, et al. The RedDots data collection for speaker recognition [EB/OL]. [2017-10-16]. https://www.crim.ca/perso/patrick.kenny/kong_interspeech_2015.pdf.
[14] KINNUNEN T, SAHIDULLAH M, FALCONE M, et al. RedDots replayed: a new replay spoofing attack corpus for text-dependent speaker verification research [C]// Proceedings of the 201742nd IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway, NJ: IEEE, 2017: 5395-5399.
[15] WU Z Z, GAO S, CLING E S, et al. A study on replay attack and anti-spoofing for text-dependent speaker verification [EB/OL]. [2017-10-16]. http://www.zhizheng.org/papers/apsipa2014_replay.pdf.

基于修正倒谱特征的回放语音检测算法

Playback speech detection algorithm based on modified cepstrum feature

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	夏玉杰, 时永鹏, 高雅, 孙鹏. 降低滤波器组多载波信号峰均比的边信息嵌入选择性映射方法[J]. 计算机应用, 2021, 41(5): 1425-1431.
[2]	杨磊, 赵红东. 基于轻量级深度神经网络的环境声音识别[J]. 计算机应用, 2020, 40(11): 3172-3177.
[3]	陈聿, 田博今, 彭云竹, 廖勇. 联合手肘法和期望最大化的高斯混合聚类电力系统客户分群算法[J]. 计算机应用, 2020, 40(11): 3217-3223.
[4]	牛晓可, 黄伊鑫, 徐华兴, 蒋震阳. 基于听皮层神经元感受野的强噪声环境下说话人识别[J]. 计算机应用, 2020, 40(10): 3034-3040.
[5]	彭磊, 杨秀云, 张裕飞, 李光耀. 基于全局与局部相似性测度的非刚性点集配准[J]. 计算机应用, 2019, 39(10): 3028-3033.
[6]	喻新荣, 李志华, 闫成雨, 李双俐. 云数据中心高效的虚拟机整合方法[J]. 计算机应用, 2018, 38(2): 550-556.
[7]	陶志勇, 刘晓芳, 王和章. 融合密度峰值的高斯混合模型聚类算法[J]. 计算机应用, 2018, 38(12): 3433-3437.
[8]	陈文兵, 管正雄, 陈允杰. 基于条件生成式对抗网络的数据增强方法[J]. 计算机应用, 2018, 38(11): 3305-3311.
[9]	陈艳, 严腾, 宋俊芳, 宋焕生. 基于高斯混合模型和AdaBoost的夜间车辆检测[J]. 计算机应用, 2018, 38(1): 260-263.
[10]	李若梦, 唐青青. 降低FBMC-OQAM峰均值比的低复杂度PTS算法[J]. 计算机应用, 2017, 37(9): 2501-2506.
[11]	黄亮, 潘平, 周超. 基于量子隧穿效应的说话人真伪鉴别方法[J]. 计算机应用, 2017, 37(9): 2617-2620.
[12]	李俊山, 杨亚威, 朱子江, 张姣. 基于自然图像块相似性和稀疏先验性的图像复原[J]. 计算机应用, 2017, 37(8): 2319-2323.
[13]	刘晙, 袁培燕, 李永锋. 基于完整可见性模型的改进鲁棒OctoMap[J]. 计算机应用, 2017, 37(5): 1445-1450.
[14]	张海艳, 高尚兵. 图像分割中改进空间约束贝叶斯网络模型的应用[J]. 计算机应用, 2017, 37(3): 823-826.
[15]	皮艾迪, 喻剑, 周笑波. 基于学习的容器环境Spark性能监控与分析[J]. 计算机应用, 2017, 37(12): 3586-3591.