计算机应用 ›› 2019, Vol. 39 ›› Issue (12): 3515-3521.DOI: 10.11772/j.issn.1001-9081.2019040678

• 人工智能 • 上一篇    下一篇

基于梅尔倒谱系数、深层卷积和Bagging的环境音分类方法

王天锐, 鲍骞月, 秦品乐   

  1. 中北大学 大数据学院, 太原 030051
  • 收稿日期:2019-04-22 修回日期:2019-07-07 出版日期:2019-12-10 发布日期:2019-12-17
  • 作者简介:王天锐(1997-),男,四川成都人,主要研究方向:深度学习、机器智能;鲍骞月(1998-),男,山西朔州人,主要研究方向:深度学习、机器视觉;秦品乐(1978-),男,山西长治人,副教授,博士,CCF会员,主要研究方向:大数据、机器视觉、三维重建。

Environmental sound classification method based on Mel-frequency cepstral coefficient, deep convolution and Bagging

WANG Tianrui, BAO Qianyue, QIN Pinle   

  1. School of Data Science and Technology, North University of China, Taiyuan Shanxi 030051, China
  • Received:2019-04-22 Revised:2019-07-07 Online:2019-12-10 Published:2019-12-17
  • Contact: 秦品乐

摘要: 针对传统环境音分类模型对环境音特征提取不充分,以及卷积神经网络用于环境音分类时全连接层易造成过拟合现象的问题,提出了梅尔倒谱系数(MFCC)、深层卷积和Bagging算法相结合的环境音分类方法。首先,针对原始音频文件,利用预加重、加窗、离散傅里叶变换、梅尔滤波器转换、离散余弦映射等方法建立梅尔倒谱系数特征模型;然后,将特征模型输入卷积深度网络进行第二次特征提取;最后,借鉴强化学习思想,用Bagging集成算法集成线性判别分析器、支持向量机(SVM)、Softmax回归、XGBoost四个模型,以投票预测的形式对网络输出结果进行预测。实验结果表明,所提方法能够有效提高对环境音的特征提取能力和深层网络在环境音分类上的抗过拟合能力。

关键词: 环境音分类, 梅尔频率倒谱系数, Bagging集成算法, 特征提取, 深度学习

Abstract: The traditional environmental sound classification model does not fully extract the features of environmental sound, and the full connection layer of conventional neural network is easy to cause over-fitting when the network is used for environmental sound classification. In order to solve the problems, an environmental sound classification method combining with Mel-Frequency Cepstral Coefficient (MFCC), deep convolution and Bagging algorithm was proposed. Firstly, for the original audio file, the MFCC model was established by using pre-emphasis, windowing, discrete Fourier transform, Mel filter transformation, discrete cosine mapping. Secondly, the feature model was input into the convolutional depth network for the second feature extraction. Finally, based on reinforcement learning, the Bagging algorithm was adopted to integrate the linear discriminant analyzer, Support Vector Machine (SVM), softmax regression and eXtreme Gradient Boost (XGBoost) models to predict the network output results by voting prediction. The experimental results show that, the proposed method can effectively improve the feature extraction ability of environmental sound and the anti-over-fitting ability of deep network in environmental sound classification.

Key words: environmental sound classification, Mel-Frequency Cepstral Coefficient (MFCC), Bagging integration algorithm, feature extraction, deep learning

中图分类号: