基于轻量级深度神经网络的环境声音识别

doi:10.11772/j.issn.1001-9081.2020030433

计算机应用 ›› 2020, Vol. 40 ›› Issue (11): 3172-3177.DOI: 10.11772/j.issn.1001-9081.2020030433

基于轻量级深度神经网络的环境声音识别

杨磊, 赵红东

河北工业大学电子信息工程学院, 天津 300300

收稿日期:2020-04-08 修回日期:2020-07-09 出版日期:2020-11-10 发布日期:2020-07-27
通讯作者: 赵红东(1968-),男,河北沧州人,教授,博士生导师,博士,主要研究方向:半导体电子光学、语音信号处理;zhaohd@hebut.edu.cn
作者简介:杨磊(1978-),男,吉林敦化人,博士研究生,CCF会员,主要研究方向:智能信息处理
基金资助:
光电信息控制和安全技术重点实验室基金资助项目（614210701041705）。

Environment sound recognition based on lightweight deep neural network

YANG Lei, ZHAO Hongdong

School of Electronic and Information Engineering, Hebei University of Technology, Tianjin 300300, China

Received:2020-04-08 Revised:2020-07-09 Online:2020-11-10 Published:2020-07-27
Supported by:
This work is partilly supported by the Fund of Electro-optical Information and Security Control Key Laboratory (614210701041705).

摘要/Abstract

摘要： 针对传统卷积神经网络（CNN）模型存在大量冗余参数的问题，提出了两个基于SqueezeNet核心结构Fire模块的轻量级网络模型Fnet1和Fnet2。之后结合移动端分布式数据采集和处理的特点，在Fnet2模型基础上，依据Dempster-Shafer（D-S）证据理论将Fnet2与深度神经网络（DNN）融合，提出新的网络模型FnetDNN。首先，建立一个具有四层卷积层的神经网络Cent作为基准，以梅尔倒谱系数（MFCC）作为特征输入来对比分析Fnet1、Fnet2和Cent的网络结构特点、计算量、卷积核参数数量及识别准确率，结论是Fnet1仅使用Cnet参数数量的10.3%就可达到86.7%的分类准确率；然后，将MFCC与全局特征向量输入到FnetDNN模型中，使得该模型的识别准确率提高到了94.4%。实验结果表明，Fnet网络模型不仅可以压缩冗余参数，还可以与其他网络相融合，具备模型扩展能力。

关键词: 环境声音识别, 深度神经网络, D-S证据理论, 梅尔倒谱系数

Abstract: The existing Convolutional Neural Network (CNN) models have a large number of redundant parameters. In order to address this problem, two lightweight network models named Fnet1 and Fnet2, based on the SqueezeNet core structure Fire module, were proposed. Then, in the view of the characteristics of distributed data collection and processing of mobile terminals, based on Fnet2, a new network model named FnetDNN, with Fnet2 integrated with Deep Neural Network (DNN), was proposed according to Dempster-Shafer (D-S) evidence theory. Firstly, a neural network named Cent with four convolutional layers was used as the benchmark, and Mel Frequency Cepstral Coefficient (MFCC) as the input feature. From aspects of the network structure characteristics, calculation cost, number of convolution kernel parameters and recognition accuracy, Fnet1, Fnet2 and Cent were analyzed. Results showed that Fnet1 only used 10.3% parameters of that of Cnet, and had the recognition accuracy of 86.7%. Secondly, MFCC and the global feature vector were input into the FnetDNN model, which improved the recognition accuracy of the model to 94.4%. Experimental results indicate that the proposed Fnet network model can compress redundant parameters as well as integrate with other networks, which has the ability to expand the model.

Key words: environment sound recognition, Deep Neural Network (DNN), Dempster-Shafer (D-S) evidence theory, Mel Frequency Cepstral Coefficient (MFCC)

中图分类号:

TP183

杨磊, 赵红东. 基于轻量级深度神经网络的环境声音识别[J]. 计算机应用, 2020, 40(11): 3172-3177.

YANG Lei, ZHAO Hongdong. Environment sound recognition based on lightweight deep neural network[J]. Journal of Computer Applications, 2020, 40(11): 3172-3177.

参考文献

[1] MYDLARZ C,SALAMON J,BELLO J P. The implementation of low-cost urban acoustic monitoring devices[J]. Applied Acoustics, 2016,117(Pt 2):207-218.
[2] LAFFITTE P,WANG Y,SODOVER D,et al. Assessing the performances of different neural network architectures for the detection of screams and shouts in public transportation[J]. Expert Systems with Applications,2019,117:29-41.
[3] BISOT V, SERIZEL R, ESSID S, et al. Acoustic scene classification with matrix factorization for unsupervised feature learning[C]//Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway:IEEE,2016:6445-6449.
[4] 李勇, 李应, 余清清. 基于流形学习和SVM的环境声音分类[J]. 计算机工程,2011,37(7):288-290.(LI Y,LI Y,YU Q Q. Environmental sound classification based on manifold learning and SVM[J]. Computer Engineering,2011,37(7):288-290.)
[5] 陈波, 俞轶颖. 基于深度神经网络的城市声音分类型研究[J]. 浙江工业大学学报,2019,47(2):199-203.(CHEN B,YU Y Y. Research on urban sound classification model based on deep neural network[J]. Journal of Zhejiang University of Technology,2019,47(2):199-203.)
[6] MUN S,SHON S,KIM W,et al. Deep neural network based learning and transferring mid-level audio features for acoustic scene classification[C]//Proceedings of the 2017 IEEE International Conference on the Acoustic, Speech and Signal Processing. Piscataway:IEEE,2017:796-800.
[7] SALAMON J,BELLO J P. Deep convolutional neural networks and data augmentation for environmental sound classification[J]. IEEE Signal Processing Letters,2017,24(3):279-283.
[8] CHEN Y, GUO Q, LIANG, X, et al. Environmental sound classification with dilate convolutions[J]. Applied Acoustics, 2019,148:123-132.
[9] TOKOZUME Y,HARADA T. Learning environmental sounds with end-to-end convolutional neural network[C]//Proceedings of the 2017 IEEE International Conference on the Acoustic,Speech and Signal Processing. Piscataway:IEEE,2017:2721-2725.
[10] 冯陈定, 李少波, 姚勇, 等. 基于改进卷积神经网络与动态衰减学习率的环境声音识别算法[J]. 科学技术与工程,2019,19(1):177-182.(FENG C D,LI S B,YAO Y,et al. Environmental sound recognition with improving convolutional neural networks and learning rate decay[J]. Science Technology and Engineering, 2019,19(1):177-182.)
[11] PICZAK K J. ESC:dataset for environmental sound classification[C]//Proceedings of the 23rd ACM International Conference on Multimedia. New York:ACM,2015:1015-1018.
[12] BODDAPATI V,PETEF A,RASMUSSON J,et al. Classifying environmental sounds using image recognition networks[J]. Procedia Computer Science,2017,112:2048-2056.
[13] LI S,YAO Y,HU J,et al. An ensemble stacked convolutional neural network model for environment event sound recognition[J]. Applied Science,2018,8(7):No. 1152.
[14] SU Y, ZHANG K, WANG J, et al. Environment sound classification using a two-stream CNN based on decision-level fusion[J]. Sensors,2019,19(7):No. 1733.
[15] ABDOLI S, CARDINAL P, KOERICH A L. End-to-end environmental sound classification using a 1D convolutional neural network[J]. Expert Systems with Applications, 2019, 136:252-263.
[16] IANDOLA F N,HAN S,MOSKEWICZ M W,et al. SqueezeNet:AlexNet-level accuracy with 50x fewer parameters and < 0.5 MB model size[EB/OL].[2020-03-05]. https://arxiv.org/pdf/1602.07360.pdf.
[17] SALAMON J,JACOBY C,BELLO J P. A dataset and taxonomy for urban sound research[C]//Proceedings of the 22nd ACM International Conference on Multimedia. New York:ACM,2014:1041-1044.
[18] 胡航. 现代语音信号处理[M]. 北京:电子工业出版社,2014:72-73.(HU H. Modern Speech Signal Processing[M]. Beijing:Publishing House of Electronics Industry,2014:72-73.)
[19] ECKLE K,SCHMIDT-HIEBER J. Acomparison of deep networks with ReLU activation function and linear spline-type methods[J]. Neural Networks,2019,110:232-242.
[20] KRIZHEVSKY A,SUTSKEVER I,HINTON G E. ImageNet classification with deep convolutional neural network[C]//Proceedings of the 25th International Conference on Neural Information Processing Systems. Red Hook, NY:Curran Associates Inc.,2012:1097-1105.
[21] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL].[2020-03-05]. https://arxiv.org/pdf/1409.1556.pdf.
[22] 胡挺, 祝永新, 田犁, 等. 面向移动平台的轻量级卷积神经网络架构[J]. 计算机工程,2019,45(1):17-22.(HU T,ZHU Y X, TIAN L, et al. Lightweight convolutional neural network architecture for mobile platforms[J]. Computer Engineering, 2019,45(1):17-22.)
[23] KANG Q,ZHAO H,YANG D,et al. Lightweight convolutional neural network for vehicle recognition in thermal infrared images[J]. Infrared Physics and Technology,2019,104:No. 103120.
[24] SRIVASTAVA N, HINTON G, KRIZHEVSKY A, et al. Dropout:a simple way to prevent neural networks from overfitting[J]. Journal of Machine Learning Research,2014,15(1):1929-1958.

基于轻量级深度神经网络的环境声音识别

Environment sound recognition based on lightweight deep neural network

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	王曙燕, 侯则昱, 孙家泽. 面向深度学习的对抗样本差异性检测方法[J]. 计算机应用, 2021, 41(7): 1849-1856.
[2]	张明明, 卢庆宁, 李文中, 宋浒. 基于联合动态剪枝的深度神经网络压缩算法[J]. 计算机应用, 2021, 41(6): 1589-1596.
[3]	张文烨, 尚方信, 郭浩. 基于Octave卷积的混合精度神经网络量化方法[J]. 计算机应用, 2021, 41(5): 1299-1304.
[4]	杨丽, 王时绘, 朱博. 基于动态和静态偏好的兴趣点推荐算法[J]. 计算机应用, 2021, 41(2): 398-406.
[5]	陈彦如, 张涂静娃, 杜千, 冉茂亮, 王红军. 基于深度森林的高铁站室内热舒适度等级预测[J]. 计算机应用, 2021, 41(1): 258-264.
[6]	李鸣, 郭晨皓, 陈星. 视觉类深度神经网络的自动标注[J]. 计算机应用, 2020, 40(6): 1593-1600.
[7]	韦伟, 李小娟. 基于相似论文增广的深度学习专利质量评估[J]. 计算机应用, 2020, 40(4): 966-971.
[8]	樊琦, 李卓, 陈昕. 基于边缘计算的分支神经网络模型推断延迟优化[J]. 计算机应用, 2020, 40(2): 342-346.
[9]	邓凯, 黄佳进, 秦进. 基于物品的统一推荐模型[J]. 计算机应用, 2020, 40(2): 530-534.
[10]	杨坚伟, 严群, 姚剑敏, 林志贤. 基于深度神经网络的移动端人像分割[J]. 计算机应用, 2020, 40(12): 3644-3650.
[11]	牛晓可, 黄伊鑫, 徐华兴, 蒋震阳. 基于听皮层神经元感受野的强噪声环境下说话人识别[J]. 计算机应用, 2020, 40(10): 3034-3040.
[12]	朱倩倩, 刘渊, 李甫. 深度神经网络的仿生矩阵约简与量化方法[J]. 计算机应用, 2020, 40(10): 2817-2821.
[13]	严红, 陈兴蜀, 王文贤, 王海舟, 殷明勇. 基于深度神经网络的法语命名实体识别模型[J]. 计算机应用, 2019, 39(5): 1288-1292.
[14]	陈龙杰, 张钰, 张玉梅, 吴晓军. 基于多注意力多尺度特征融合的图像描述生成算法[J]. 计算机应用, 2019, 39(2): 354-359.
[15]	谢志华, 江鹏, 余新河, 张帅. 基于VGGNet和多谱带循环网络的高光谱人脸识别系统[J]. 计算机应用, 2019, 39(2): 388-391.