计算机应用 ›› 2020, Vol. 40 ›› Issue (11): 3172-3177.DOI: 10.11772/j.issn.1001-9081.2020030433

• 人工智能 • 上一篇    下一篇

基于轻量级深度神经网络的环境声音识别

杨磊, 赵红东   

  1. 河北工业大学 电子信息工程学院, 天津 300300
  • 收稿日期:2020-04-08 修回日期:2020-07-09 出版日期:2020-11-10 发布日期:2020-07-27
  • 通讯作者: 赵红东(1968-),男,河北沧州人,教授,博士生导师,博士,主要研究方向:半导体电子光学、语音信号处理;zhaohd@hebut.edu.cn
  • 作者简介:杨磊(1978-),男,吉林敦化人,博士研究生,CCF会员,主要研究方向:智能信息处理
  • 基金资助:
    光电信息控制和安全技术重点实验室基金资助项目(614210701041705)。

Environment sound recognition based on lightweight deep neural network

YANG Lei, ZHAO Hongdong   

  1. School of Electronic and Information Engineering, Hebei University of Technology, Tianjin 300300, China
  • Received:2020-04-08 Revised:2020-07-09 Online:2020-11-10 Published:2020-07-27
  • Supported by:
    This work is partilly supported by the Fund of Electro-optical Information and Security Control Key Laboratory (614210701041705).

摘要: 针对传统卷积神经网络(CNN)模型存在大量冗余参数的问题,提出了两个基于SqueezeNet核心结构Fire模块的轻量级网络模型Fnet1和Fnet2。之后结合移动端分布式数据采集和处理的特点,在Fnet2模型基础上,依据Dempster-Shafer(D-S)证据理论将Fnet2与深度神经网络(DNN)融合,提出新的网络模型FnetDNN。首先,建立一个具有四层卷积层的神经网络Cent作为基准,以梅尔倒谱系数(MFCC)作为特征输入来对比分析Fnet1、Fnet2和Cent的网络结构特点、计算量、卷积核参数数量及识别准确率,结论是Fnet1仅使用Cnet参数数量的10.3%就可达到86.7%的分类准确率;然后,将MFCC与全局特征向量输入到FnetDNN模型中,使得该模型的识别准确率提高到了94.4%。实验结果表明,Fnet网络模型不仅可以压缩冗余参数,还可以与其他网络相融合,具备模型扩展能力。

关键词: 环境声音识别, 深度神经网络, D-S证据理论, 梅尔倒谱系数

Abstract: The existing Convolutional Neural Network (CNN) models have a large number of redundant parameters. In order to address this problem, two lightweight network models named Fnet1 and Fnet2, based on the SqueezeNet core structure Fire module, were proposed. Then, in the view of the characteristics of distributed data collection and processing of mobile terminals, based on Fnet2, a new network model named FnetDNN, with Fnet2 integrated with Deep Neural Network (DNN), was proposed according to Dempster-Shafer (D-S) evidence theory. Firstly, a neural network named Cent with four convolutional layers was used as the benchmark, and Mel Frequency Cepstral Coefficient (MFCC) as the input feature. From aspects of the network structure characteristics, calculation cost, number of convolution kernel parameters and recognition accuracy, Fnet1, Fnet2 and Cent were analyzed. Results showed that Fnet1 only used 10.3% parameters of that of Cnet, and had the recognition accuracy of 86.7%. Secondly, MFCC and the global feature vector were input into the FnetDNN model, which improved the recognition accuracy of the model to 94.4%. Experimental results indicate that the proposed Fnet network model can compress redundant parameters as well as integrate with other networks, which has the ability to expand the model.

Key words: environment sound recognition, Deep Neural Network (DNN), Dempster-Shafer (D-S) evidence theory, Mel Frequency Cepstral Coefficient (MFCC)

中图分类号: