基于优化正交匹配追踪和深度置信网的声音识别

doi:10.11772/j.issn.1001-9081.2017.02.0505

计算机应用 ›› 2017, Vol. 37 ›› Issue (2): 505-511.DOI: 10.11772/j.issn.1001-9081.2017.02.0505

基于优化正交匹配追踪和深度置信网的声音识别

陈秋菊, 李应

福州大学数学与计算机科学学院, 福州 350116

收稿日期:2016-06-12 修回日期:2016-08-04 出版日期:2017-02-10 发布日期:2017-02-11
通讯作者: 李应,fj_liying@fzu.edu.cn
作者简介:陈秋菊(1989-),女,贵州遵义人,硕士研究生,主要研究方向:多媒体数据检索、声音事件检测;李应(1964-),男,福建闽清人,教授,博士,主要研究方向:多媒体数据检索、声音事件检测、信息安全。
基金资助:
国家自然科学基金资助项目（61075022）。

Sound recognition based on optimized orthogonal matching pursuit and deep belief network

CHEN Qiuju, LI Ying

College of Mathematics and Computer Science, Fuzhou University, Fuzhou Fujian 350116, China

Received:2016-06-12 Revised:2016-08-04 Online:2017-02-10 Published:2017-02-11
Supported by:
This work is partially supported by the National Natural Science Foundation of China (61075022).

摘要/Abstract

摘要： 针对各种环境声音对声音事件识别的影响，提出一种基于优化的正交匹配追踪（OOMP）和深度置信网（DBN）的声音事件识别方法。首先，利用粒子群优化（PSO）算法优化OMP稀疏分解，在实现正交匹配追踪（OMP）的快速稀疏分解的同时，保留声音信号的主体部分，抑制噪声对声音信号的影响；接着，对重构声音信号提取Mel频率倒谱系数（MFCC）、OMP时-频特征和基音频率（Pitch）特征，组成OOMP的复合特征；最后，使用DBN对提取的OOMP特征进行特征学习，并对40种声音事件在不同环境不同信噪比下进行识别。实验结果表明，OOMP特征结合DBN的方法适用于各种环境声下的声音事件识别，而且能有效地识别各种环境下的声音事件，即使在信噪比（SNR）为0 dB的情况下，仍然能保持平均60%的识别率。

关键词: 声音事件识别, 正交匹配追踪, 稀疏分解, 粒子群优化, 深度置信网

Abstract: Concerning the influence of various environmental ambiances on sound event recognition, a sound event recognition method based on Optimized Orthogonal Matching Pursuit (OOMP) and Deep Belief Network (DBN) was proposed. Firstly, Particle Swarm Optimization (PSO) algorithm was used to optimize Orthogonal Matching Pursuit (OMP) sparse decomposition of sound signal, which realized fast sparse decomposition of OMP and reserved the main body of sound signal and reduced the influence of noise. Then, an optimized composited feature was composed by Mel-Frequency Cepstral Coefficient (MFCC), time-frequency OMP feature and Pitch feature extracted from the reconstructed sound signal, which was called OOMP feature. Finally, the DBN was employed to learn the OOMP feature and recognize 40 classes of sound events in different environments and Signal-to-Noise Ratio (SNR). The experimental results show that the proposed method which combined OOMP and BDN is suitable for sound event recognition in various environments, and can effectively recognize sound events in various environments; it can still maitain an average accuracy rate of 60% even when the SNR is 0 dB.

Key words: sound event recognition, Orthogonal Matching Pursuit (OMP), sparse decomposition, Particle Swarm Optimization (PSO), Deep Belief Network (DBN)

中图分类号:

TP391.42

陈秋菊, 李应. 基于优化正交匹配追踪和深度置信网的声音识别[J]. 计算机应用, 2017, 37(2): 505-511.

CHEN Qiuju, LI Ying. Sound recognition based on optimized orthogonal matching pursuit and deep belief network[J]. Journal of Computer Applications, 2017, 37(2): 505-511.

参考文献

[1] DENG L, LI J Y, HUANG J T, et al. Recent advances in deep learning for speech research at Microsoft[C]//ICASSP'13:Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway, NJ:IEEE, 2013:8604-8608.
[2] LEE H, PHAM P, LARGMAN Y, et al. Unsupervised feature learning for audio classification using convolutional deep belief networks[C]//NIPS'09:Proceedings of the 2009 Conference Advances in Neural Information Processing Systems 22. Cambridge, CA:MIT Press, 2009:1096-1104.
[3] HINTON G, DENG L, YU D, et al. Deep neural networks for acoustic modeling in speech recognition:the shared views of four research groups[J]. IEEE Signal Processing Magazine, 2012, 29(6):82-97.
[4] SAINATH T N, MOHAMED A, KINGSBURY B, et al. Deep convolutional neural networks for LVCSR[C]//ICASSP'13:Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway, NJ:IEEE, 2013:8614-8618.
[5] HAMEL P, ECK D. Learning features from music audio with deep belief networks[C]//ISMIR'10:Proceedings of the 201011th International Society for Music Information Retrieval Conference. Piscataway, NJ:IEEE, 2010:339-344.
[6] KAGAYA H, AIZAWA K, OGAWA M. Food detection and recognition using convolutional neural network[C]//MM'14:Proceedings of the 201422nd ACM International Conference on Multimedia. New York:ACM, 2014:1085-1088.
[7] RAVANELLI M, ELIZALDE B, NI K, et al. Audio concept classification with hierarchical deep neural networks[C]//EUSIPCO'14:Proceedings of the 201422nd European Signal Processing Conference. Piscataway, NJ:IEEE, 2014:606-610.
[8] SZEGEDY C, LIU W, JIA Y, et al. Going deeper with convolutions[C]//CVPR'15:Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ:IEEE, 2015:1-9.
[9] YU D, SELTZER M L, LI J Y, et al. Feature learning in deep neural networks-studies on speech recognition tasks[EB/OL].[2016-03-26]. Computer Science, 2013, 5(1):1301.3605. https://arxiv.org/pdf/1301.3605v3.pdf.
[10] DAHL G E, YU D, DENG L, et al. Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2012, 20(1):30-42.
[11] MCLOUGHLIN I, ZHANG H M, XIE Z P, et al. Robust sound event classification using deep neural networks[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2015, 23(3):540-552.
[12] HINTON G E, OSINDERO S, TEH Y-W. A fast learning algorithm for deep belief nets[J]. Neural Computation, 2006, 18(7):1527-1554.
[13] HINTON G E. A practical guide to training restricted Boltzmann machines[M]//Neural Networks:Tricks of the Trade, LNCS 7700. 2nd ed. Berlin:Springer, 2012:599-619.
[14] ACKLEY D H, HINTON G E, SEJNOWSKI T J. A learning algorithm for Boltzmann machines[J]. Cognitive Science, 1985, 9(1):147-169.
[15] LAROCHELLE H, MANDEL M, PASCANU R, et al. Learning algorithms for the classification restricted Boltzmann machine[J]. Journal of Machine Learning Research, 2012, 13(1):643-669.
[16] LE ROUX N, BENGIO Y. Representational power of restricted Boltzmann machines and deep belief networks[J]. Neural Computation, 2008, 20(6):1631-1649.
[17] FARAHAT M, HALAVATI R. Noise robust speech recognition using deep belief networks[J]. International Journal of Computational Intelligence and Applications, 2016, 15(1):1650005.
[18] MOHAMED A, DAHL G E, HINTON G. Acoustic modeling using deep belief networks[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2012, 20(1):14-22.
[19] GUO F, YANG D S, CHEN X O. Using deep belief network to capture temporal information for audio event classification[C]//IIH-MSP'15:Proceedings of the 2015 International Conference on Intelligent Information Hiding and Multimedia Signal Processing. Piscataway, NJ:IEEE, 2015:421-424.
[20] LEE Y K, JUNG G W, KWON O W. Speech enhancement by Kalman filtering with a particle filter-based preprocessor[C]//ICCE'13:Proceedings of the 2013 IEEE International Conference on Consumer Electronics, Piscataway, NJ:IEEE, 2013:340-341.
[21] VERMA N, VERMA A K. Real time adaptive denoising of musical signals in wavelet domain[C]//NUiCONE'12:Proceedings of the 2012 Nirma University International Conference on Engineering, Piscataway, NJ:IEEE, 2012:1-5.
[22] 周晓敏,李应.基于Radon和平移不变性小波变换的鸟类声音识别[J].计算机应用,2014,34(5):1391-1396,1417.(ZHOU X M, LI Y. Bird sounds recognition based on Radon and translation invariant discrete wavelet transform[J]. Journal of Computer Applications, 2014, 34(5):1391-1396, 1417.).
[23] CHU S, NARAYANAN S, KUO C C J. Environmental sound recognition with time-frequency audio features[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2009, 17(6):1142-1158.
[24] WANG J C, LIN C H, CHEN B W, et al. Gabor-based nonuniform scale-frequency map for environmental sound classification in home automation[J]. IEEE Transactions on Automation Science and Engineering, 2014, 11(2):607-613.
[25] MALLAT S G, ZHANG Z F. Matching pursuits with time-frequency dictionaries[J]. IEEE Transactions on Signal Processing, 1993, 41(12):3397-3415.
[26] SOUSSEN C, GRIBONVAL R, IDIER J, et al. Joint k-step analysis of orthogonal matching pursuit and orthogonal least squares[J]. IEEE Transactions on Information Theory, 2013, 59(5):3158-3174.
[27] KENNEDY J, EBERHART R. Particle swarm optimization[C]//ICNN'95:Proceedings of the1995 IEEE International Conference on Neural Networks. Piscataway, NJ:IEEE, 1995:1942-1948.
[28] 马超,邓超,熊尧,等.一种基于混合遗传和粒子群的智能优化算法[J].计算机研究与发展,2013,50(11):2278-2286. (MA C, DENG C, XIONG Y, et al. An intelligent optimization algorithm based on hybrid of GA and PSO[J]. Journal of Computer Research and Development, 2013, 50(11):2278-2286.).
[29] LI S T, FANG L Y. Signal denoising with random refined orthogonal matching pursuit[J]. IEEE Transactions on Instrumentation and Measurement, 2012, 61(1):26-34.
[30] Universitat Pompeu Fabra. Repository of sound under the creative commons license[DB/OL].[2016-03-14]. http://www.freesound.org.
[31] CHANG C C, LIN C J. LIBSVM:a library for support vector machines[J]. ACM Transactions on Intelligent Systems and Technology, 2011, 2(3):Article No. 27.
[32] BREIMAN L. Random forests[J]. Machine Learning, 2001, 45(1):5-32.
[33] 颜鑫,李应.利用抗噪幂归一化倒谱系数的鸟类声音识别[J].电子学报,2013,41(2):295-300. (YAN X, LI Y. Anti-noise power normalized cepstral coefficients in bird sounds recognition[J]. Acta Electronic Sinica, 2013, 41(2):295-300.)

基于优化正交匹配追踪和深度置信网的声音识别

Sound recognition based on optimized orthogonal matching pursuit and deep belief network

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	张闻强, 邢征, 杨卫东. 基于多区域采样策略的混合粒子群优化求解多目标柔性作业车间调度问题[J]. 计算机应用, 2021, 41(8): 2249-2257.
[2]	张盟, 郭健全. 需求和回收不确定的闭环供应链渠道结构选择[J]. 计算机应用, 2021, 41(7): 2100-2107.
[3]	刘紫燕, 马珊珊, 白鹤. 基于改进智能水滴的正交匹配追踪混合预编码算法[J]. 计算机应用, 2021, 41(5): 1419-1424.
[4]	杨蒙蒙, 张爱华. 基于灰度共生矩阵和同步正交匹配追踪的分形图像压缩[J]. 计算机应用, 2021, 41(5): 1445-1449.
[5]	唐延强, 李成海, 宋亚飞. 基于改进粒子群优化和极限学习机的网络安全态势预测[J]. 计算机应用, 2021, 41(3): 768-773.
[6]	李萍, 汪芬, 陈祺东, 孙俊. 求解多目标社区发现问题的离散化随机漂移粒子群优化算法[J]. 计算机应用, 2021, 41(3): 803-811.
[7]	樊小毛, 熊红林, 赵淦森. 带约束的清洁排班问题模型及其求解[J]. 计算机应用, 2021, 41(2): 577-582.
[8]	王泽昆, 贺毅朝, 李焕哲, 张发展. 基于新颖S型转换函数的二进制粒子群优化算法求解具有单连续变量的背包问题[J]. 计算机应用, 2021, 41(2): 461-469.
[9]	郭秀婷, 朱昶胜, 张生财, 赵奎鹏. 分形插值在风速时间序列中的应用[J]. 计算机应用, 2020, 40(9): 2628-2633.
[10]	罗斌, 于波. 移动边缘计算中基于粒子群优化的计算卸载策略[J]. 计算机应用, 2020, 40(8): 2293-2298.
[11]	周文峰, 梁晓磊, 唐可心, 李章洪, 符修文. 具有拓扑时变和搜索扰动的混合粒子群优化算法[J]. 计算机应用, 2020, 40(7): 1913-1918.
[12]	曾明华, 全轲. 双层规划的改进混合布谷鸟搜索量子行为粒子群优化算法[J]. 计算机应用, 2020, 40(7): 1908-1912.
[13]	霍晴晴, 郭健全. 基于改进遗传算法的生鲜多目标闭环物流网络模型[J]. 计算机应用, 2020, 40(5): 1494-1500.
[14]	许洋, 秦小林, 刘佳, 张力戈. 多无人机自适应编队协同航迹规划[J]. 计算机应用, 2020, 40(5): 1515-1521.
[15]	施晓倩, 陈祺东, 孙俊, 冒钟杰. 变分布的量子行为粒子群优化算法求解工程约束优化问题[J]. 计算机应用, 2020, 40(5): 1382-1388.