Forensics algorithm of various operations for digital speech

doi:10.11772/j.issn.1001-9081.2018071596

Journal of Computer Applications ›› 2019, Vol. 39 ›› Issue (1): 126-130.DOI: 10.11772/j.issn.1001-9081.2018071596

Previous Articles Next Articles

Forensics algorithm of various operations for digital speech

XIANG Li, YAN Diqun, WANG Rangding, LI Xiaowen

Faculty of Electrical Engineering and Computer Science, Ningbo University, Ningbo Zhejiang 315211, China

Received:2018-07-19 Revised:2018-08-07 Online:2019-01-10 Published:2019-01-21
Supported by:
This work is partially supported by the National Natural Science Foundation of China (U1736215, 61672302), the Natural Science Foundation of Zhejiang Province (LZ15F020002, LY17F020010), the Natural Science Foundation of Ningbo (2017A610123), the Ningbo University Fund (XKXL1509, XKXL1503).

针对多种处理痕迹的数字语音取证算法

向立, 严迪群, 王让定, 李孝文

宁波大学信息科学与工程学院, 浙江宁波 315211

通讯作者: 严迪群
作者简介:向立(1994-),男,湖南湘西人,硕士研究生,主要研究方向:多媒体通信、信息安全;严迪群(1979-),男,浙江余姚人,副教授,博士,CCF会员,主要研究方向:多媒体通信、信息安全;王让定(1962-),男,甘肃天水人,教授,博士,CCF会员,主要研究方向:多媒体通信安全、信息隐藏与隐写分析;李孝文(1996-),男,浙江温州人,硕士研究生,主要研究方向:多媒体通信、信息安全。
基金资助:
国家自然科学基金资助项目（U1736215，61672302）；浙江省自然科学基金资助项目（LZ15F020002，LY17F020010）；宁波市自然科学基金资助项目（2017A610123）；宁波大学学科基金资助项目（XKXL1509，XKXL1503）。

Abstract

Abstract: Most existing forensic methods for digital speech aim at detecting a specific operation, which means that these methods can not identify various operations at a time. To solve the problem, a universal forensic algorithm for simultaneously detecting various operations, such as pitch modification, low-pass filtering, high-pass filtering, and noise adding was proposed. Firstly, the statistical moments of Mel-Frequency Cepstral Coefficients (MFCC) were calculated, and cepstrum mean and variance normalization were applied to the moments. Then, a multi-class classifier based on multiple two-class classifiers was constructed. Finally, the classifier was used to identify various types of speech operations. The experimental results on TIMIT and UME speech datasets show that the proposed universal features achieve detection accuracy over 97% for various speech operations. And the detection accuracy in the test of MP3 compression robustness is still above 96%.

Key words: speech forensics, Mel-Frequency Cepstral Coefficient (MFCC), operation trace, multi-class classifier

摘要： 现有的数字语音取证研究主要集中于对单一的某种操作进行检测，无法对不相关的操作进行判断。针对该问题，提出了一种能够同时检测经过变调、低通滤波、高通滤波和加噪这四种操作的数字语音取证方法。首先，计算语音的归一化梅尔频率倒谱系数（MFCC）统计矩特征；然后通过多个二分类器对特征进行训练，并组合投票得到多分类器；最后使用该多分类器对待测语音进行分类。在TIMIT以及UME语音库上的实验结果表明，归一化MFCC统计矩特征在库内实验中均达到了97%以上的检测率，且在对MP3压缩鲁棒性测试的实验中，检测率仍能保持在96%以上。

关键词: 语音取证, 梅尔频率倒谱系数, 处理痕迹, 多分类器

CLC Number:

XIANG Li, YAN Diqun, WANG Rangding, LI Xiaowen. Forensics algorithm of various operations for digital speech[J]. Journal of Computer Applications, 2019, 39(1): 126-130.

向立, 严迪群, 王让定, 李孝文. 针对多种处理痕迹的数字语音取证算法[J]. 计算机应用, 2019, 39(1): 126-130.

References

[1] LUO D, YANG R, LI B, et al. Detection of double compressed AMR audio using stacked autoencoder[J]. IEEE Transactions on Information Forensics and Security, 2017, 12(2):432-444.
[2] LUO D, YANG R, HUANG J W. Detecting double compressed AMR audio using deep learning[C]//ICASSP 2014:Proceedings of the 39th International Conference on Acoustics, Speech and Signal Processing. Piscataway, NJ:IEEE, 2014:2669-2673.
[3] WU H J, WANG Y, HUANG J W. Identification of electronic disguised voices[J]. IEEE Transactions on Information Forensics and Security, 2014, 9(3):489-500.
[4] WU H, WANG Y, HUANG J W. Blind detection of electronic disguised voice[C]//ICASSP 2013:Proceedings of the 38th International Conference on Acoustics, Speech and Signal Processing. Piscataway, NJ:IEEE, 2013:3013-3017.
[5] 徐宏伟,严迪群,阳帆,等.基于卷积神经网络的电子变调语音检测算法[J].电信科学,2018,34(2):46-57.(XU H W, YAN D Q, YANG F, et al. Detection algorithm of electronic disguised voice based on convolutional neural network[J]. Telecommunications Science, 2018, 34(2):46-57.)
[6] LUO D, KORUS P, HUANG J W. Band energy difference for source attribution in audio forensics[J]. IEEE Transactions on Information Forensics and Security, 2018, 13(9):2179-2189.
[7] ZOU L, HE Q H, FENG X H. Cell phone verification from speech recordings using sparse representation[C]//ICASSP 2015:Proceedings of the 40th International Conference on Acoustics, Speech and Signal Processing. Piscataway, NJ:IEEE, 2015:1787-1791.
[8] QI S M, HUANG Z, LI Y, et al. Audio recording device identification based on deep learning[C]//ICSIP 2016:Proceedings of the 2016 International Conference on Signal and Image Processing. Piscataway, NJ:IEEE, 2016:426-431.
[9] JAKUB G, MARCIN G, RAFAL S. Playback attack detection for text-dependent speaker verification over telephone channels[J]. Speech Communication, 2015, 67:143-153.
[10] LI H D, LUO W Q, QIU X Q, et al. Identification of various image operations using residual-based features[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2018, 28(1):31-45.
[11] PRASAD N V, UMESH S. Improved cepstral mean and variance normalization using Bayesian framework[C]//ASRU 2013:Proceedings of the 2013 Automatic Speech Recognition and Understanding. Piscataway, NJ:IEEE, 2014:156-161.
[12] KNERR S, PERSONNAZ L, DREYFUS G. Single-layer learning revisited:a stepwise procedure for building and training a neural network[J]. Neurocomputing:Algorithms, Architectures and Application, 1990, 68(11):41-50.

Forensics algorithm of various operations for digital speech

针对多种处理痕迹的数字语音取证算法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 3

Recommended Articles

Metrics

[1]	WANG Tianrui, BAO Qianyue, QIN Pinle. Environmental sound classification method based on Mel-frequency cepstral coefficient, deep convolution and Bagging [J]. Journal of Computer Applications, 2019, 39(12): 3515-3521.
[2]	ZHANG Xiaoxia LI Ying. Bird sounds recognition based on energy detection in complex environments [J]. Journal of Computer Applications, 2013, 33(10): 2945-2949.
[3]	Qiong LIU Hui-can ZHOU Yao-nan WANG. Method for unsupervised text location based on brightness grading and direction density [J]. Journal of Computer Applications, 2008, 28(6): 1523-1526.