计算机应用 ›› 2017, Vol. 37 ›› Issue (2): 505-511.DOI: 10.11772/j.issn.1001-9081.2017.02.0505

• 人工智能 • 上一篇    下一篇

基于优化正交匹配追踪和深度置信网的声音识别

陈秋菊, 李应   

  1. 福州大学 数学与计算机科学学院, 福州 350116
  • 收稿日期:2016-06-12 修回日期:2016-08-04 出版日期:2017-02-10 发布日期:2017-02-11
  • 通讯作者: 李应,fj_liying@fzu.edu.cn
  • 作者简介:陈秋菊(1989-),女,贵州遵义人,硕士研究生,主要研究方向:多媒体数据检索、声音事件检测;李应(1964-),男,福建闽清人,教授,博士,主要研究方向:多媒体数据检索、声音事件检测、信息安全。
  • 基金资助:
    国家自然科学基金资助项目(61075022)。

Sound recognition based on optimized orthogonal matching pursuit and deep belief network

CHEN Qiuju, LI Ying   

  1. College of Mathematics and Computer Science, Fuzhou University, Fuzhou Fujian 350116, China
  • Received:2016-06-12 Revised:2016-08-04 Online:2017-02-10 Published:2017-02-11
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61075022).

摘要: 针对各种环境声音对声音事件识别的影响,提出一种基于优化的正交匹配追踪(OOMP)和深度置信网(DBN)的声音事件识别方法。首先,利用粒子群优化(PSO)算法优化OMP稀疏分解,在实现正交匹配追踪(OMP)的快速稀疏分解的同时,保留声音信号的主体部分,抑制噪声对声音信号的影响;接着,对重构声音信号提取Mel频率倒谱系数(MFCC)、OMP时-频特征和基音频率(Pitch)特征,组成OOMP的复合特征;最后,使用DBN对提取的OOMP特征进行特征学习,并对40种声音事件在不同环境不同信噪比下进行识别。实验结果表明,OOMP特征结合DBN的方法适用于各种环境声下的声音事件识别,而且能有效地识别各种环境下的声音事件,即使在信噪比(SNR)为0 dB的情况下,仍然能保持平均60%的识别率。

关键词: 声音事件识别, 正交匹配追踪, 稀疏分解, 粒子群优化, 深度置信网

Abstract: Concerning the influence of various environmental ambiances on sound event recognition, a sound event recognition method based on Optimized Orthogonal Matching Pursuit (OOMP) and Deep Belief Network (DBN) was proposed. Firstly, Particle Swarm Optimization (PSO) algorithm was used to optimize Orthogonal Matching Pursuit (OMP) sparse decomposition of sound signal, which realized fast sparse decomposition of OMP and reserved the main body of sound signal and reduced the influence of noise. Then, an optimized composited feature was composed by Mel-Frequency Cepstral Coefficient (MFCC), time-frequency OMP feature and Pitch feature extracted from the reconstructed sound signal, which was called OOMP feature. Finally, the DBN was employed to learn the OOMP feature and recognize 40 classes of sound events in different environments and Signal-to-Noise Ratio (SNR). The experimental results show that the proposed method which combined OOMP and BDN is suitable for sound event recognition in various environments, and can effectively recognize sound events in various environments; it can still maitain an average accuracy rate of 60% even when the SNR is 0 dB.

Key words: sound event recognition, Orthogonal Matching Pursuit (OMP), sparse decomposition, Particle Swarm Optimization (PSO), Deep Belief Network (DBN)

中图分类号: