稀疏正则非负矩阵分解的语音增强算法

doi:10.11772/j.issn.1001-9081.2017092316

计算机应用 ›› 2018, Vol. 38 ›› Issue (4): 1176-1180.DOI: 10.11772/j.issn.1001-9081.2017092316

• 虚拟现实与多媒体计算 • 上一篇下一篇

稀疏正则非负矩阵分解的语音增强算法

蒋茂松, 王冬霞, 牛芳琳, 曹玉东

辽宁工业大学电子与信息工程学院, 辽宁锦州 121001

收稿日期:2017-09-26 修回日期:2017-10-27 出版日期:2018-04-10 发布日期:2018-04-09
通讯作者: 王冬霞
作者简介:蒋茂松(1989-),男,安徽六安人,硕士研究生,主要研究方向:现代信号处理、多媒体;王冬霞(1975-),女,辽宁锦州人,教授,博士,主要研究方向:阵列、语音处理与通信;牛芳琳(1971-),女,辽宁锦州人,副教授,博士,主要研究方向:信息论、信道编码、数字喷泉码;曹玉东(1975-),男,辽宁锦州人,副教授,博士,主要研究方向:图像识别、图像理解。
基金资助:
辽宁省科学事业公益研究基金资助项目（20170056）。

Speech enhancement method based on sparsity-regularized non-negative matrix factorization

JIANG Maosong, WANG Dongxia, NIU Fanglin, CAO Yudong

College of Electronic and Information Engineering, Liaoning University of Technology, Jinzhou Liaoning 121001, China

Received:2017-09-26 Revised:2017-10-27 Online:2018-04-10 Published:2018-04-09
Supported by:
This work is partially supported by the Scientific Public Welfare Research Foundation of Liaoning Province (20170056).

摘要/Abstract

摘要： 对于非负矩阵分解的语音增强算法在不同环境噪声的鲁棒性问题，提出一种稀疏正则非负矩阵分解（SRNMF）的语音增强算法。该算法不仅考虑到数据处理时的噪声影响，而且对系数矩阵进行了稀疏约束，使其分解出的数据具有较好的语音特征。该算法首先在对语音和噪声的幅度谱先验字典矩阵学习的基础上，构建联合字典矩阵，然后更新带噪语音幅度谱在联合字典矩阵下的系数矩阵，最后重构原始纯净语音，实现语音增强。实验结果表明，在非平稳噪声和低信噪比（小于0 dB）条件下，该算法较好地削弱了噪声的变化对算法性能的影响，不仅有较高的信源失真率（SDR），提高了1~1.5个数量级，而且运算速度也有一定程度的提高，使得基于非负矩阵分解的语音增强算法更实用。

关键词: 非负矩阵分解, 语音增强, 稀疏正则, 鲁棒性, 联合字典

Abstract: In order to improve the robustness of Non-negative Matrix Factorization (NMF) algorithm for speech enhancement in different background noises, a speech enhancement algorithm based on Sparsity-regularized Robust NMF (SRNMF) was proposed, which takes into account the noise effect of data processing, and makes sparse constraints on the coefficient matrix to get better speech characteristics of the decomposed data. First, the prior dictionary of the amplitude spectrum of speech and noise were learned and the joint dictionary matrix of speech and noise were constructed. Then, the SRNMF algorithm was used to update the coefficient matrix of the amplitude spectrum with noise in the joint dictionary matrix. Finally, the original pure speech was reconstructed, and enhanced. The speech enhancement performance of the SRNMF algorithm in different environmental noise was analyzed through simulation experiments. Experimental results show that the proposed algorithm can effectively weaken the influence of noise changes on performance under non-stationary environments and low Signal-to-Noise Ratio (SNR) (<0 dB), it not only has about 1-1.5 magnitudes improvement in Source-to-Distortion Ratio (SDR) scores, but also is faster than other algorithms, which makes the NMF-based speech enhancement algorithm more practical.

Key words: Non-negative Matrix Factorization (NMF), speech enhancement, sparsity-regularization, robustness, joint dictionary

中图分类号:

TN912.35

蒋茂松, 王冬霞, 牛芳琳, 曹玉东. 稀疏正则非负矩阵分解的语音增强算法[J]. 计算机应用, 2018, 38(4): 1176-1180.

JIANG Maosong, WANG Dongxia, NIU Fanglin, CAO Yudong. Speech enhancement method based on sparsity-regularized non-negative matrix factorization[J]. Journal of Computer Applications, 2018, 38(4): 1176-1180.

参考文献

[1] EPHRAIM Y, MALAH D. Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator[J]. IEEE Transactions on Acoustics Speech & Signal Processing, 2003, 32(6):1109-1121.
[2] 蔡宇, 郝程鹏, 侯朝焕. 采用子带谱减法的语音增强[J]. 计算机应用, 2014, 34(2):567-571.(CAI Y, HAO C P, HOU C H. Speech enhancement based on subband spectrum subtraction algorithm[J]. Journal of Computer Applications, 2014,34(2):567-571.)
[3] JABLOUN F, CHAMPAGNE B. Incorporating the human hearing properties in the signal subspace approach for speech enhancement[J]. IEEE Transactions on Speech & Audio Processing, 2010, 11(6):700-708.
[4] XU Y, DU J, DAI L R, et al. An experimental study on speech enhancement based on deep neural networks[J]. IEEE Signal Processing Letters, 2014, 21(1):65-68.
[5] XU Y, DU J, DAI L R, et al. A regression approach to speech enhancement based on deep neural networks[J]. IEEE/ACM Transactions on Audio Speech & Language Processing, 2015, 23(1):7-19.
[6] LEE D D, SEUNGH S. Algorithms for non-negative matrix factorization[C]//NIPS 2000:Proceedings of the 13th International Conference on Neural Information Processing Systems. Cambridge, MA:MIT Press, 2000:556-562.
[7] KWON K, SHIN J W, KIM N S. NMF-based speech enhancement using bases update[J]. IEEE Signal Processing Letters, 2015, 22(4):450-454.
[8] MOHAMMADIHA N, SMARAGDIS P, LEIJON A. Supervised and unsupervised speech enhancement using nonnegative matrix factorization[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2013, 21(10):2140-2151.
[9] 卢宏, 赵知劲, 杨小牛. 基于行列式和稀疏性约束的NMF的欠定盲分离方法[J]. 计算机应用, 2011, 31(2):553-555.(LU H, ZHAO Z J, YANG X N. Algorithm for underdetermined blind source separation based on DSNMF[J]. Journal of Computer Applications, 2011, 31(2):553-555.)
[10] O'GRADY P D, PEARLMUTTERB A. Discovering speech phones using convolutive non-negative matrix factorisation with a sparseness constraint[J]. Neurocomputing, 2008, 72(1/2/3):88-101.
[11] VU T T, BIGOT B, CHNG E S. Combining non-negative matrix factorization and deep neural networks for speech enhancement and automatic speech recognition[C]//Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway, NJ:IEEE, 2016:499-503.
[12] ZHANG L, CHEN Z, ZHENG M, et al. Robust non-negative matrix factorization[J]. Frontiers of Electrical & Electronic Engineering in China, 2011, 6(2):192-200.
[13] HE W, ZHANG H Y, ZHANG L P. Sparsity-regularized robust non-negative matrix factorization for hyperspectral unmixing[J]. IEEE Journal of Selected Topics in Applied Earth Observations & Remote Sensing, 2016, 9(9):4267-4279.
[14] MYSORE G J, SMARAGDIS P. A non-negative approach to semi-supervised separation of speech from noise with the use of temporal dynamics[C]//Proceedings of the 2011 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway, NJ:IEEE, 2011:17-20.
[15] CHUNG H, PLOURDE E, CHAMPAGNE B. Regularized NMF-based speech enhancement with spectral components modeled by Gaussian mixtures[C]//Proceedings of the 2014 IEEE International Workshop on Machine Learning for Signal Processing. Piscataway, NJ:IEEE, 2014:1-6.
[16] HALE E T, YIN W, ZHANG Y. Fixed-point continuation for l1-minimization:methodology and convergence[J]. SIAM Journal on Optimization, 2008, 19(3):1107-1130.
[17] XU W, LIU X, GONG Y. Document clustering based on nonnegative matrix factorization[C]//Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York:ACM, 2003:267-273.
[18] WILSON K W, RAJ B, SMARAGDIS P, et al. Speech denoising using nonnegative matrix factorization with priors[C]//ICASSP 2008:Proceedings of the 2008 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway, NJ:IEEE,2008:4029-4032.
[19] RIX A W, BEERENDS J G, HOLLIER M P, et al. Perceptual Evaluation of Speech Quality(PESQ)-a new method for speech quality assessment of telephone networks and codecs[C]//ICASSP 2001:Proceedings of the 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Piscataway, NJ:IEEE, 2001:749-752.
[20] VINCENT E, GRIBONVAL R, FEVOTTE C. Performance measurement in blind audio source separation[J]. IEEE Transactions on Audio Speech & Language Processing, 2006, 14(4):1462-1469.

稀疏正则非负矩阵分解的语音增强算法

Speech enhancement method based on sparsity-regularized non-negative matrix factorization

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	高工, 杨红雨, 刘洪. 基于深度学习的三维点云人脸识别[J]. 计算机应用, 2021, 41(9): 2736-2740.
[2]	林筠超, 万源. 基于图结构优化的自适应多度量非监督特征选择方法[J]. 计算机应用, 2021, 41(5): 1282-1289.
[3]	王锦凯, 贾旭. 基于迁移孪生非负矩阵分解的静脉识别算法[J]. 计算机应用, 2021, 41(3): 898-903.
[4]	裴仪瑶, 郭会明, 张丹普, 陈文博. 基于定位不确定性的鲁棒3D目标检测方法[J]. 计算机应用, 2021, 41(10): 2979-2984.
[5]	龙超, 曾庆宁, 罗瀛. 基于噪声抵消与波束形成的小阵语音增强[J]. 计算机应用, 2020, 40(8): 2386-2391.
[6]	邢志伟, 乔迪, 刘洪恩, 高志伟, 罗晓, 罗谦. 基于松弛算法的停机位分配优化方法[J]. 计算机应用, 2020, 40(6): 1850-1855.
[7]	王本杰, 农丽萍, 张文辉, 林基明, 王俊义. 基于Spider卷积的三维点云分类与分割网络[J]. 计算机应用, 2020, 40(6): 1607-1612.
[8]	吴庆贺, 吴海锋, 沈勇, 曾玉. 工业噪声环境下多麦状态空间模型语音增强算法[J]. 计算机应用, 2020, 40(5): 1476-1482.
[9]	王锦凯, 贾旭. 基于加权正交约束非负矩阵分解的车脸识别算法[J]. 计算机应用, 2020, 40(4): 1050-1055.
[10]	王永彪, 张文喜, 王亚慧, 孔新新, 吕彤. 拉普拉斯分布下的MMSE谱减语音增强算法[J]. 计算机应用, 2020, 40(3): 878-882.
[11]	成其伟, 陈启买, 贺超波, 刘海. 基于改进对称二值非负矩阵分解的重叠社区发现方法[J]. 计算机应用, 2020, 40(11): 3203-3210.
[12]	刘颖, 梁楠楠, 李大湘, 杨凡超. 基于光谱距离聚类的高光谱图像解混算法[J]. 计算机应用, 2019, 39(9): 2541-2546.
[13]	陈善学, 储成泉. 基于稀疏和正交约束非负矩阵分解的高光谱解混[J]. 计算机应用, 2019, 39(8): 2276-2280.
[14]	黄光球, 谢蓉. 考虑节点过载的碳排放空间关联系统级联失效模型[J]. 计算机应用, 2019, 39(6): 1829-1835.
[15]	来杰, 王晓丹, 李睿, 赵振冲. 基于去噪自编码器的极限学习机[J]. 计算机应用, 2019, 39(6): 1619-1625.