计算机应用 ›› 2018, Vol. 38 ›› Issue (1): 79-83.DOI: 10.11772/j.issn.1001-9081.2017071896

• 2017年全国开放式分布与并行计算学术年会(DPCS 2017)论文 • 上一篇    下一篇

基于卷积神经网络的翻录语音检测算法

李璨, 王让定, 严迪群   

  1. 宁波大学 信息科学与工程学院, 浙江 宁波 315211
  • 收稿日期:2017-08-01 修回日期:2017-08-19 出版日期:2018-01-10 发布日期:2018-01-22
  • 通讯作者: 王让定
  • 作者简介:李璨(1992-),女,安徽淮北人,硕士研究生,主要研究方向:多媒体信息安全;王让定(1962-),男,浙江宁波人,教授,博士,CCF会员,主要研究方向:多媒体信息安全、数字取证;严迪群(1979-),男,浙江宁波人,副教授,博士,CCF会员,主要研究方向:多媒体信息安全、数字取证。
  • 基金资助:
    国家自然科学基金资助项目(61672302,61300055);浙江省自然科学基金资助项目(LZ15F020002,LY17F020010);宁波市自然科学基金资助项目(2017A610123);宁波大学科研基金资助项目(XKXL1509,XKXL1503);宁波大学王宽诚幸福基金资助项目。

Recaptured speech detection algorithm based on convolutional neural network

LI Can, WANG Rangding, YAN Diqun   

  1. College of Information Science and Engineering, Ningbo University, Ningbo Zhejiang 315211, China
  • Received:2017-08-01 Revised:2017-08-19 Online:2018-01-10 Published:2018-01-22
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61672302, 61300055), the Natural Science Foundation of Zhejiang Province (LZ15F020002, LY17F020010), the Natural Science Foundation of Ningbo (2017A610123),the Scientific Research Foundation of Ningbo University (XKXL1509, XKXL1503),the K.C. Wong Magna Fund in Ningbo University.

摘要: 针对翻录语音攻击说话人识别系统,危害合法用户的权益问题,提出了一种基于卷积神经网络(CNN)的翻录语音检测算法。首先,通过提取原始语音与翻录语音的语谱图,并将其输入到卷积神经网络中,对其进行特征提取及分类;然后,搭建了适应于检测翻录语音的网络框架,分析讨论了输入不同窗移的语谱图对检测率的影响;最后,对不同偷录及回放设备的翻录语音进行了交叉实验检测,并与现有的经典算法进行了对比。实验结果表明,所提方法能够准确地判断待测语音是否为翻录语音,其识别率达到了99.26%,与静音段梅尔频率倒谱系数(MFCC)算法、信道模式噪声算法和长时窗比例因子算法相比,识别率分别提高了约26个百分点、21个百分点和0.35个百分点。

关键词: 卷积神经网络, 翻录语音检测, 语谱图, 录音设备, 网络框架

Abstract: Aiming at the problems that recaptured speech attack to speaker recognition system harms the rights and interests of legitimate users, a recaptured speech detection algorithm based on Convolutional Neural Network (CNN) was proposed. Firstly, the spectrograms of the original speech and the recaptured speech were extracted and input into the CNN for feature extraction and classification. Secondly, for the detection task, a new network architecture was constructed, and the effect of the spectrograms with different window shifts were discussed. Finally, the cross-over experiments for various recapture and replay devices were constructed. The experimental results demonstrate that the proposed method can accurately discriminate whether the detected speech is recaptured or not, and the recognition rate achieves 99.26%. Compared with the mute segment Mel-Frequency Cepstral Coefficient (MFCC) algorithm, channel mode noise algorithm and long window scale factor algorithm, the recognition rate is increased by about 26 percentage points, about 21 percentage points and about 0.35 percentage points respectively.

Key words: Convolutional Neural Network (CNN), recaptured speech detection, spectrogram, recording device, network architecture

中图分类号: