融合残差网络和极限梯度提升的音频隐写检测模型

doi:10.11772/j.issn.1001-9081.2020060775

计算机应用 ›› 2021, Vol. 41 ›› Issue (2): 449-455.DOI: 10.11772/j.issn.1001-9081.2020060775

所属专题：网络空间安全

融合残差网络和极限梯度提升的音频隐写检测模型

陈朗, 王让定, 严迪群, 林昱臻

宁波大学信息科学与工程学院, 浙江宁波 315211

收稿日期:2020-06-19 修回日期:2020-08-17 出版日期:2021-02-10 发布日期:2020-10-20
通讯作者: 王让定
作者简介:陈朗(1997-),男,四川巴中人,硕士研究生,主要研究方向:多媒体信息安全、信息隐藏、隐写分析;王让定(1962-),男,甘肃天水人,教授,博士,CCF会员,主要研究方向:多媒体信息安全、信息隐藏、隐写分析;严迪群(1979-),男,浙江余姚人,副教授,博士,CCF会员,主要研究方向:多媒体信息安全、数字取证;林昱臻(1994-),男,浙江宁波人,硕士研究生,主要研究方向:多媒体信息安全、隐写分析。
基金资助:
国家自然科学基金资助项目（U1736215，61672302，61901237）；浙江省自然科学基金资助项目（LY20F020010，LY17F020010）；浙江省移动网应用技术重点实验室开放基金资助项目（F2018001）；宁波大学王宽诚幸福基金资助项目；宁波大学研究生科研创新基金资助项目（IF2020131）。

Audio steganography detection model combing residual network and extreme gradient boosting

CHEN Lang, WANG Rangding, YAN Diqun, LIN Yuzhen

Faculty of Electrical Engineering and Computer Science, Ningbo University, Ningbo Zhejiang 315211, China

Received:2020-06-19 Revised:2020-08-17 Online:2021-02-10 Published:2020-10-20
Supported by:
This work is partially supported by the National Natural Science Foundation of China (U1736215, 61672302, 61901237), the Natural Science Foundation of Zhejiang Province (LY20F020010, LY17F020010), the Open Foundation of the Mobile Network Application Technology Key Laboratory of Zhejiang Province (F2018001), the K.C. Wong Magna Fund in Ningbo University, the Scientific Research Graduate Innovation Foundation of Ningbo University (IF2020131).

摘要/Abstract

摘要： 针对目前音频隐写检测方法对基于校验网格编码（STC）的音频隐写检测准确较低的问题，考虑到卷积神经网络（CNN）在抽象特征提取上的优势，提出一种融合深度残差网络（DRN）和极限梯度提升（XGBoost）的音频隐写检测模型。首先，利用固定参数的高通滤波器（HPF）预处理输入的音频，并通过三个卷积层提取特征，其中第一个卷积层使用了截断线性单元（TLU）激活函数，使得模型适应低信噪比（SNR）下的隐写信号分布；其次，通过五个阶段的残差块和池化操作进一步提取抽象特征；最后，经过全连接层和Dropout层将提取的高维特征作为XGBoost模型的输入进行分类。分别对STC隐写和最低有效位匹配（LSBM）隐写进行检测，实验结果表明，所提出的模型在0.5 bps、0.2 bps、0.1 bps三种嵌入率下，即音频每个采样值平均修改的比特数分别为0.5、0.2、0.1时，对子校验矩阵高度为7的STC隐写的平均检测准确率分别为73.27%、70.16%、65.18%，对LSBM隐写的平均检测准确率分别为86.58%、76.08%、72.82%。相较于传统提取手工特征的隐写检测方法和深度学习隐写检测方法，所提模型对两种隐写算法的平均检测准确率均提高了10个百分点以上。

关键词: 深度残差网络, 极限梯度提升, 校验网格编码隐写, 最低有效位匹配隐写, 音频隐写检测

Abstract: Aiming at the problem that the current audio steganography detection methods have low accuracy in detecting audio steganography based on Syndrome-Trellis Codes (STC), and considering the advantages of Convolutional Neural Network (CNN) in extracting abstract features, a model for audio steganography detection combining Deep Residual Network (DRN) and eXtreme Gradient Boosting (XGBoost) was proposed. Firstly, a fixed-parameter High-Pass Filter (HPF) was used to preprocess the input audio, and features were extracted through three convolutional layers. Truncated Linear Unit (TLU) activation function was applied in the first convolutional layer to make the model adapt to the distribution of steganographic signals with low Signal-To-Noise Ratio (SNR). Then, the abstract features were further extracted by five-stage residual blocks and pooling operations. Finally, the extracted high-dimensional features were classified as inputs of the XGBoost model through fully connected layers and dropout layers. The STC steganography and the Least Significant Bit Matching (LSBM) steganography were detected respectively. Experimental results showed that when the embedding rates were 0.5 bps (bit per sample), 0.2 bps and 0.1 bps respectively, that is to say, the average number of bits modified for per audio sample equaled to 0.5, 0.2 and 0.1 respectively, the proposed model achieved average detection accuracies of 73.27%, 70.16% and 65.18% respectively for the STC steganography with a sub check matrix with height of 7, and the average detection accuracies of 86.58%, 76.08% and 72.82% respectively for the LSBM steganography. Compared with the traditional steganography detection methods based on extracting handcrafted features and deep learning steganography detection methods, the proposed model has the average detection accuracies for the two steganography algorithms both increased by more than 10 percent points.

Key words: Deep Residual Network (DRN), eXtreme Gradient Boosting (XGBoost), Syndrome-Trellis Codes (STC) steganography, Least Significant Bit Matching (LSBM) steganography, audio steganography detection

中图分类号:

TP391.4

陈朗, 王让定, 严迪群, 林昱臻. 融合残差网络和极限梯度提升的音频隐写检测模型[J]. 计算机应用, 2021, 41(2): 449-455.

CHEN Lang, WANG Rangding, YAN Diqun, LIN Yuzhen. Audio steganography detection model combing residual network and extreme gradient boosting[J]. Journal of Computer Applications, 2021, 41(2): 449-455.

参考文献

[1] JOHNSON M K,LYU S,FARID H. Steganalysis of recorded speech[C]//Proceedings of the SPIE 5681, Security, Steganography,and Watermarking of Multimedia Contents Ⅶ. Bellingham,WA:SPIE,2005:664-672.
[2] KRAETZER C,DITTMANN J. Mel-cepstrum-based steganalysis for VoIP steganography[C]//Proceedings of the SPIE 6505, Security, Steganography, and Watermarking of Multimedia Contents Ⅸ. Bellingham,WA:SPIE,2007:No. 650505.
[3] LIU Q,SUNG A H,QIAO M. Temporal derivative-based spectrum and Mel-cepstrum audio steganalysis[J]. IEEE Transactions on Information Forensics and Security,2009,4(3):359-368.
[4] LIU Q,SUNG A H,QIAO M. Derivative-based audio steganalysis[J]. ACM Transactions on Multimedia Computing, Communications,and Applications,2011,7(3):No. 18.
[5] GEETHA S,ISHWARYA N,KAMARAJ N. Audio steganalysis with Hausdorff distance higher order statistics using a rule based decision tree paradigm[J]. Expert Systems with Applications, 2010,37(12):7469-7482.
[6] 王昱洁, 杨萍, 蒋薇薇. 一种基于MDCT量化系数统计特征的AAC音频隐写分析方法[J]. 合肥工业大学学报(自然科学版), 2015,38(10):1348-1352,1409.(WANG Y J,YANG P,JIANG W W. A steganalysis method of AAC audio based on statistical features of MDCT quantized coefficients[J]. Journal of Hefei University of Technology(Natural Science),2015,38(10):1348-1352,1409.)
[7] 王昱洁, 蒋薇薇. 基于模糊C均值聚类与单类支持向量机的音频隐写分析方法[J]. 计算机应用,2016,36(3):647-652. (WANG Y J,J W W. Audio steganalysis method based on fuzzy Cmeans clustering and one class support vector machine[J]. Journal of Computer Applications,2016,36(3):647-652.)
[8] HAN C,XUE R,ZHANG R,et al. A new audio steganalysis method based on linear prediction[J]. Multimedia Tools and Applications,2018,77(12):15431-15455.
[9] REN Y,XIONG Q,WANG L. A steganalysis scheme for AAC audio based on MDCT difference between intra and inter frame[C]//Proceedings of the 2017 International Workshop on Digital Watermarking,LNCS 10431. Cham:Springer,2017:217-231.
[10] CHEN B,LUO W,LI H. Audio steganalysis with convolutional neural network[C]//Proceedings of the 5th ACM Workshop on Information Hiding and Multimedia Security. New York:ACM, 2017:85-90.
[11] LIN Y, WANG R, YAN D, et al. Audio steganalysis with improved convolutional neural network[C]//Proceedings of the 7th ACM Workshop on Information Hiding and Multimedia Security. New York:ACM,2019:210-215.
[12] WANG Y,YANG K,YI X,et al. CNN-based steganalysis of MP3 steganography in entropy code domain[C]//Proceedings of the 6th ACM Workshop on Information Hiding and Multimedia Security. New York:ACM,2018:55-65.
[13] YANG H,YANG Z,HUANG Y. Steganalysis of VoIP streams with CNN-LSTM network[C]//Proceedings of the 7th ACM Workshop on Information Hiding and Multimedia Security. New York:ACM,2019:204-209.
[14] FILLER T, JUDAS J, FRIDRICH J. Minimizing additive distortion in steganography using syndrome-trellis codes[J]. IEEE Transactions on Information Forensics and Security,2011,6(3):920-935.
[15] CHEN T, GUESTRIN C. XGBoost:a scalable tree boosting system[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York:ACM,2016:785-794.
[16] SRIVASTAVA N, HINTON G, KRIZHEVSKY A, et al. Dropout:a simple way to prevent neural networks from overfitting[J]. Journal of Machine Learning Research,2014,15(56):1929-1958.
[17] YE J,NI J,YI Y. Deep learning hierarchical representations for image steganalysis[J]. IEEE Transactions on Information Forensics and Security,2017,12(11):2545-2557.
[18] HE K,ZHANG X,REN S,et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2016:770-778.
[19] GAROFOLO J S,LAMEL L F,FISHER W M,et al. DARPA TIMIT:acoustic-phonetic continuous speech corpus:NISTIR4930[R]. Gaithersburg, MD:National Institute of Standards and Technology,1993.
[20] LUO W,LI H,YAN Q,et al. Improved audio steganalytic feature and its applications in audio forensics[J]. ACM Transactions on Multimedia Computing, Communications, and Applications, 2018,14(2):No. 43.

融合残差网络和极限梯度提升的音频隐写检测模型

Audio steganography detection model combing residual network and extreme gradient boosting

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 6

编辑推荐

Metrics

[1]	任奕茗, 王让定, 严迪群, 林昱臻. 基于深度残差网络的语音隐写分析方法[J]. 计算机应用, 2021, 41(3): 774-779.
[2]	钟莎, 黄玉清. 基于孪生区域候选网络的无人机指定目标跟踪[J]. 计算机应用, 2021, 41(2): 523-529.
[3]	余英东, 杨怡, 林澜. 结合纹理特征分析的图像风格转换网络[J]. 计算机应用, 2020, 40(3): 638-644.
[4]	郭茂祖, 张彬, 赵玲玲, 张昱. 基于联合特征和XGBoost的活动语义识别方法[J]. 计算机应用, 2020, 40(11): 3159-3165.
[5]	莫赞, 赵冰, 黄艳莹. 基于经验模态分解自回归组合模型的网络舆情预测[J]. 计算机应用, 2018, 38(3): 615-619.
[6]	张珂, 高策, 郭丽茹, 苑津莎, 赵振兵. 非受限条件下的深度人脸年龄分类[J]. 计算机应用, 2017, 37(11): 3244-3248.