计算机应用 ›› 2016, Vol. 36 ›› Issue (5): 1426-1429.DOI: 10.11772/j.issn.1001-9081.2016.05.1426

• 虚拟现实与数字媒体 • 上一篇    下一篇

基于稀疏表示权重张量的音频特征提取算法

林静1, 杨继臣2, 张雪源2, 李新超2   

  1. 1. 茂名职业技术学院 机电信息系, 广东 茂名 525000;
    2. 华南理工大学 电子与信息学院, 广州 510641
  • 收稿日期:2015-10-14 修回日期:2016-01-18 出版日期:2016-05-10 发布日期:2016-05-09
  • 通讯作者: 林静
  • 作者简介:林静(1982-),女,广东茂名人,讲师,硕士,主要研究方向:音视频信号处理;杨继臣(1980-),男,安徽界首人,副研究员,博士,主要研究方向:音视频信号处理;张雪源(1987-),男,河北石家庄人,博士研究生,主要研究方向:音视频信号处理;李新超(1980-),男,河南南阳人,博士研究生,主要研究方向:智能优化、信号处理。
  • 基金资助:
    国家自然科学基金资助项目(61301300)。

Audio feature extraction algorithm based on weight tensor of sparse representation

LIN Jing1, YANG Jichen2, ZHANG Xueyuan2, LI Xinchao2   

  1. 1. Department of Mechanical and Electrical Information, Maoming Vocational and Technical College, Maoming Guangdong 525000, China;
    2. School of Electronic and Information Engineering, South China University of Technology, Guangzhou Guangdong 510641, China
  • Received:2015-10-14 Revised:2016-01-18 Online:2016-05-10 Published:2016-05-09
  • Supported by:
    This work is supported by the National Natural Science Foundation of China (61301300).

摘要: 为了更好地描述非平稳音频信号的特征,提出了一种基于Gabor字典和稀疏表示权重张量的时-频音频特征提取方法。该方法基于Gabor字典将音频信号编码为稀疏的权重向量,并进一步将权重向量中的元素重新排列为张量形式,该张量各阶分别刻画了信号的时间、频率以及时长特性,为信号的联合时-频-长表示。通过对该张量进行因子分解,将分解后得到的频率因子和时长因子拼接为音频特征。针对稀疏张量分解时容易产生过拟合的问题,提出一种自调整惩罚参数分解算法并进行了改进。实验结果显示,所提出的特征相对于传统梅尔倒谱系数(MFCC)特征、MFCC特征及匹配追踪算法(MP)求解的特征联合拼接得到的MFCC+MP特征和非均匀尺度-频率图特征对15类音效分类效果分别提升了28.0%、19.8%和6.7%。

关键词: 稀疏表示, 张量因子分解, 音效分类, 时-频特征

Abstract: A joint time-frequency audio feature extraction algorithm based on Gabor dictionary and weight tensor of sparse representation was proposed to describe the characteristic of non-stationary audio signal. Conventional sparse representation uses a predefined dictionary to encode the audio signal as sparse weight vector. In this paper, the elements in the weight vector were reorganized into tensor format. Each order of the tensor respectively characterized time, frequency and duration property of signal, making it the joint time-frequency-duration representation of the signal. The frequency factors and duration factors were concatenated as audio features through tensor decomposition. To solve the over-fitting problem of sparse tensor factorization, an automatic-adjust-penalty-coefficient factorization algorithm was proposed. The experimental results show that the proposed feature outperforms MFCC (Mel-Frequency Cepstrum Coefficient) feature, MFCC+MP feature concatenated by MFCC and Matching Pursuit (MP) features, and nonuniform scale-frequency map feature by 28.0%, 19.8% and 6.7% respectively, in 15-category audio classification.

Key words: sparse representation, tensor factorization, audio effect classification, time-frequency feature

中图分类号: