基于稀疏表示权重张量的音频特征提取算法

doi:10.11772/j.issn.1001-9081.2016.05.1426

计算机应用 ›› 2016, Vol. 36 ›› Issue (5): 1426-1429.DOI: 10.11772/j.issn.1001-9081.2016.05.1426

基于稀疏表示权重张量的音频特征提取算法

林静¹, 杨继臣², 张雪源², 李新超²

1. 茂名职业技术学院机电信息系, 广东茂名 525000;
2. 华南理工大学电子与信息学院, 广州 510641

收稿日期:2015-10-14 修回日期:2016-01-18 出版日期:2016-05-10 发布日期:2016-05-09
通讯作者: 林静
作者简介:林静(1982-),女,广东茂名人,讲师,硕士,主要研究方向:音视频信号处理;杨继臣(1980-),男,安徽界首人,副研究员,博士,主要研究方向:音视频信号处理;张雪源(1987-),男,河北石家庄人,博士研究生,主要研究方向:音视频信号处理;李新超(1980-),男,河南南阳人,博士研究生,主要研究方向:智能优化、信号处理。
基金资助:
国家自然科学基金资助项目(61301300)。

Audio feature extraction algorithm based on weight tensor of sparse representation

LIN Jing¹, YANG Jichen², ZHANG Xueyuan², LI Xinchao²

1. Department of Mechanical and Electrical Information, Maoming Vocational and Technical College, Maoming Guangdong 525000, China;
2. School of Electronic and Information Engineering, South China University of Technology, Guangzhou Guangdong 510641, China

Received:2015-10-14 Revised:2016-01-18 Online:2016-05-10 Published:2016-05-09
Supported by:
This work is supported by the National Natural Science Foundation of China (61301300).

摘要/Abstract

摘要： 为了更好地描述非平稳音频信号的特征,提出了一种基于Gabor字典和稀疏表示权重张量的时-频音频特征提取方法。该方法基于Gabor字典将音频信号编码为稀疏的权重向量,并进一步将权重向量中的元素重新排列为张量形式,该张量各阶分别刻画了信号的时间、频率以及时长特性,为信号的联合时-频-长表示。通过对该张量进行因子分解,将分解后得到的频率因子和时长因子拼接为音频特征。针对稀疏张量分解时容易产生过拟合的问题,提出一种自调整惩罚参数分解算法并进行了改进。实验结果显示,所提出的特征相对于传统梅尔倒谱系数(MFCC)特征、MFCC特征及匹配追踪算法(MP)求解的特征联合拼接得到的MFCC+MP特征和非均匀尺度-频率图特征对15类音效分类效果分别提升了28.0%、19.8%和6.7%。

关键词: 稀疏表示, 张量因子分解, 音效分类, 时-频特征

Abstract: A joint time-frequency audio feature extraction algorithm based on Gabor dictionary and weight tensor of sparse representation was proposed to describe the characteristic of non-stationary audio signal. Conventional sparse representation uses a predefined dictionary to encode the audio signal as sparse weight vector. In this paper, the elements in the weight vector were reorganized into tensor format. Each order of the tensor respectively characterized time, frequency and duration property of signal, making it the joint time-frequency-duration representation of the signal. The frequency factors and duration factors were concatenated as audio features through tensor decomposition. To solve the over-fitting problem of sparse tensor factorization, an automatic-adjust-penalty-coefficient factorization algorithm was proposed. The experimental results show that the proposed feature outperforms MFCC (Mel-Frequency Cepstrum Coefficient) feature, MFCC+MP feature concatenated by MFCC and Matching Pursuit (MP) features, and nonuniform scale-frequency map feature by 28.0%, 19.8% and 6.7% respectively, in 15-category audio classification.

Key words: sparse representation, tensor factorization, audio effect classification, time-frequency feature

中图分类号:

TN912.3

林静, 杨继臣, 张雪源, 李新超. 基于稀疏表示权重张量的音频特征提取算法[J]. 计算机应用, 2016, 36(5): 1426-1429.

LIN Jing, YANG Jichen, ZHANG Xueyuan, LI Xinchao. Audio feature extraction algorithm based on weight tensor of sparse representation[J]. Journal of Computer Applications, 2016, 36(5): 1426-1429.

参考文献

[1] ZUBAIR S, WANG W. Audio classification based on sparse coefficients[C]//Sensor Signal Processing for Defence (SSPD 2011). London, UK:The Institution of Engineering and Technology Press, 2011:1-5.
[2] ZUBAIR S, YAN F, WANG W. Dictionary learning based sparse coefficients for audio classification with max and average pooling[J]. Digital Signal Processing, 2013, 23(3):960-970.
[3] CHU S, NARAYANAN S, KUO C C J. Environmental sound recognition with time-frequency audio features[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2009, 17(6):1142-1158.
[4] SIVASANKARAN S, PRABHU K M M. Robust features for environmental sound classification[C]//Proceedings of the 2013 IEEE International Conference on Electronics, Computing and Communication Technologies. Piscataway, NJ:IEEE, 2013:1-6.
[5] WANG J C, LIN C H, CHEN B W, et al. Gabor-based nonuniform scale-frequency map for environmental sound classification in home automation[J]. IEEE Transactions on Automation Science and Engineering, 2014, 11(2):607-613.
[6] TAKEUCHI K, ISHIGURO K, KIMURA A, et al. Non-negative multiple matrix factorization[C]//Proceedings of the 23rd International Joint Conference on Artificial Intelligence. Beijing:AAAI, 2013:1713-1720.
[7] LIU J, LIU J, WONKA P, et al. Sparse non-negative tensor factorization using columnwise coordinate descent[J]. Pattern Recognition, 2012, 45(1):649-656.
[8] CICHOCKI A, ZDUNEK R, PHAN A H, et al. Nonnegative Matrix and Tensor Factorizations:Applications to Exploratory Multi-way Data Analysis and Blind Source Separation[M]. New York:John Wiley & Sons, 2009:35-37.
[9] CHANG L H, WU J Y. An improved RIP-based performance guarantee for sparse signal recovery via orthogonal matching pursuit[J].IEEE Transactions on Information Theory, 2014, 60(9):5702-5715.
[10] Digital Juice, Incorporated. The digital juice sound FX library[DB/OL].[2015-05-20]. http://www.digitaljuice.com.
[11] British Broadcasting Corporation (BBC).BBC sound effects library[DB/OL].[2015-05-20] http://www.sound-ideas.com/bbc.html.

[1]	裴春阳, 樊宽刚, 马政. 基于边缘保留分解和改进稀疏表示的医学图像融合[J]. 计算机应用, 2021, 41(7): 2092-2099.
[2]	任晓奎, 刘鹏飞, 陶志勇, 刘影, 白立春. 基于单快拍信号到达角估计算法的室内入侵检测[J]. 计算机应用, 2021, 41(4): 1153-1159.
[3]	王丽娟, 陈少敏, 尹明, 许跃颖, 郝志峰, 蔡瑞初, 温雯. 基于近邻图改进的块对角子空间聚类算法[J]. 计算机应用, 2021, 41(1): 36-42.
[4]	高彦彦, 李莉, 张晶, 贾英茜. 结合梯度投影稀疏重构和复数小波的图像重构[J]. 计算机应用, 2020, 40(2): 486-490.
[5]	宗春梅, 张月琴, 曹建芳, 赵青杉. 基于深度先验及非局部相似性的压缩感知核磁共振成像[J]. 计算机应用, 2020, 40(10): 3054-3059.
[6]	吴宗骏, 吴炜, 杨晓敏, 刘凯, Gwanggil Jeon, 袁皓. 改进的基于稀疏表示的全色锐化算法[J]. 计算机应用, 2019, 39(2): 540-545.
[7]	陶永鹏, 景雨, 顼聪. 基于分组字典与变分模型的图像去噪算法[J]. 计算机应用, 2019, 39(2): 551-555.
[8]	杜凯敏, 康宝生. 基于图像块分类的图像超分辨率重建[J]. 计算机应用, 2019, 39(2): 577-581.
[9]	周立军, 刘凯, 吕海燕. 基于竞争学习的稀疏受限玻尔兹曼机机制[J]. 计算机应用, 2018, 38(7): 1872-1876.
[10]	王丽芳, 董侠, 秦品乐, 高媛. 基于自适应联合字典学习的脑部多模态图像融合方法[J]. 计算机应用, 2018, 38(4): 1134-1140.
[11]	邹佳彬, 孙伟. 基于提升静态小波变换与联合结构组稀疏表示的多聚焦图像融合[J]. 计算机应用, 2018, 38(3): 859-865.
[12]	王鑫, 周韵, 宁晨, 石爱业. 自适应融合局部和全局稀疏表示的图像显著性检测[J]. 计算机应用, 2018, 38(3): 866-872.
[13]	贾旭, 孙福明, 李豪杰, 曹玉东. 具有普适性的改进非负矩阵分解图像特征提取方法[J]. 计算机应用, 2018, 38(1): 233-237.
[14]	谢小雨, 刘喆颉. 基于肌电信号和加速度信号的动态手势识别方法[J]. 计算机应用, 2017, 37(9): 2700-2704.
[15]	冯辉, 荆晓远, 朱小柯. 基于多视图特征投影与合成解析字典学习的图像分类[J]. 计算机应用, 2017, 37(7): 1960-1966.

基于稀疏表示权重张量的音频特征提取算法

Audio feature extraction algorithm based on weight tensor of sparse representation

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics