基于掩蔽估计与优化的单通道语音增强算法

doi:10.11772/j.issn.1001-9081.2019030486

计算机应用 ›› 2019, Vol. 39 ›› Issue (10): 3065-3070.DOI: 10.11772/j.issn.1001-9081.2019030486

• 虚拟现实与多媒体计算 • 上一篇下一篇

基于掩蔽估计与优化的单通道语音增强算法

葛宛营, 张天骐

重庆邮电大学通信与信息工程学院, 重庆 400065

收稿日期:2019-03-25 修回日期:2019-06-20 发布日期:2019-07-05 出版日期:2019-10-10
通讯作者: 张天骐
作者简介:葛宛营(1994-),男,河南三门峡人,硕士研究生,主要研究方向:语音信号处理、语音增强;张天骐(1971-),男,四川眉山人,教授,博士,主要研究方向:扩频通信、盲信号处理、语音信号处理。
基金资助:
国家自然科学基金资助项目（61671095，61702065，61701067，61771085）；信号与信息处理重庆市市级重点实验室建设项目（CSTC2009CA2003）；重庆市研究生科研创新项目（CYS17219）；重庆市教育委员会科研项目（KJ1600427，KJ1600429）。

Monaural speech enhancement algorithm based on mask estimation and optimization

GE Wanying, ZHANG Tianqi

School of Communication and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China

Received:2019-03-25 Revised:2019-06-20 Online:2019-07-05 Published:2019-10-10
Supported by:
This work is partially supported by the National Natural Science Foundation of China (61671095, 61702065, 61701067, 61771085), the Project of Key Laboratory of Signal and Information Processing of Chongqing (CSTC2009CA2003), the Chongqing Graduate Research and Innovation Project (CYS17219), the Research Project of Chongqing Educational Commission (KJ1600427, KJ1600429).

摘要/Abstract

摘要： 单通道语音增强算法通过从带噪语音中估计并抑制噪声成分来得到增强语音。然而，噪声估计算法在计算时存在过估现象，导致部分估计噪声能量值比实际值大。尽管可以通过补偿消去这些过估值，但引入的误差同样会降低增强语音的整体质量。针对此问题，提出一种基于计算听觉场景分析（CASA）的时频掩蔽估计与优化算法。首先，通过直接判决（DD）算法估计先验信噪比（SNR）并计算初始掩蔽；其次，利用噪声与带噪语音在Gammatone频带内的互相关（ICC）系数来计算噪声的存在概率，结合带噪语音能量谱得到新的噪声估计，减少原估计噪声中的过估成分；然后，利用优化算法对初始掩蔽进行迭代处理以减少其中因噪声过估而存在的误差并增加其中的目标语音成分，在满足条件后停止迭代并得到新的掩蔽；最后，利用新的掩蔽合成增强语音。实验结果表明在不同的背景噪声下，相比优化前，新的掩蔽使增强语音获得了较高的主观语音质量（PESQ）和语音可懂度（STOI）值，提升了语音听感与可懂度。

关键词: 计算听觉场景分析, 语音增强, 时频掩蔽, 噪声估计, 掩蔽优化, 语音可懂度

Abstract: Monaural speech enhancement algorithms obtain enhanced speech by estimating and negating the noise components in speech with noise. However, the over-estimation and the error of the introduction to make up the over-estimation of noise power make detrimental effect on the enhanced speech. To constrain the distortion caused by noise over-estimation, a time-frequency mask estimation and optimization algorithm based on Computational Auditory Scene Analysis (CASA) was proposed. Firstly, Decision Directed (DD) algorithm was used to estimate the priori Signal-to-Noise Ratio (SNR) and calculate the initial mask. Secondly, the Inter-Channel Correlation (ICC) factor between noise and speech with noise in each Gammatone filterbank channel was used to calculate the noise presence probability, the new noise estimation was obtained by the probability combining with the power spectrum of speech with noise, and the over-estimation of the primary estimated noise was decreased. Thirdly, the initial mask was iterated by the optimization algorithm to reduce the error caused by the noise over-estimation and raise the target speech components in the mask, and the new mask was obtained when the iteration stopped with the conditions met. Finally, the optimization method was used to optimize the estimated mask. The enhanced speech was composed by using the new mask. Experimental results demonstrate that the new mask has higher Perceptual Evaluation of Speech Quality (PESQ) and Short-Time Objective Intelligibility measure (STOI) values of the enhanced speech in comparison with the mask before optimization, improving the intelligibility and listening feeling of speech.

Key words: computational auditory scene analysis, speech enhancement, time-frequency mask, noise estimation, mask optimization, speech intelligibility

中图分类号:

TN912.35

葛宛营, 张天骐. 基于掩蔽估计与优化的单通道语音增强算法[J]. 计算机应用, 2019, 39(10): 3065-3070.

GE Wanying, ZHANG Tianqi. Monaural speech enhancement algorithm based on mask estimation and optimization[J]. Journal of Computer Applications, 2019, 39(10): 3065-3070.

参考文献

[1] 曹亮, 张天骐, 高洪兴, 等. 基于听觉掩蔽效应的多频带谱减语音增强方法[J]. 计算机工程与设计, 2013, 34(1):235-240. (CAO L, ZHANG T Q, GAO H X, et al. Multi-band spectral subtraction method for speech enhancement based on masking property of human auditory system[J]. Computer Engineering and Design, 2013, 34(1):235-240.)
[2] 李季碧, 马永保, 夏杰, 等. 一种基于修正倒谱平滑技术改进的维纳滤波语音增强算法[J]. 重庆邮电大学学报(自然科学版), 2016, 28(4):462-467. (LI J B, MA Y B, XIA J, et al. An improved Wiener filtering speech enhancement algorithm based on modified cepstrum smooth technology[J]. Journal of Chongqing University of Posts and Telecommunications (Natural Science Edition), 2016, 28(4):462-467.)
[3] BOROWICZ A, PETROVSKY A. Signal subspace approach for psychoacoustically motivated speech enhancement[J]. Speech communication, 2011, 53(2):210-219.
[4] HU K, WANG D. Unvoiced speech segregation from nonspeech interference via CASA and spectral subtraction[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2011, 19(6):1600-1609.
[5] WANG Y, NARAYANAN A, WANG D, et al. On training targets for supervised speech separation[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2014, 22(12):1849-1858.
[6] BAO F, ABDULLA W H. Noise masking method based on an effective ratio mask estimation in Gammatone channels[J]. APSIPA Transactions on Signal and Information Processing, 2018, 7(e5):1-12.
[7] SUN M, LI Y, GEMMEKE J F, et al. Speech enhancement under low SNR conditions via noise estimation using sparse and low-rank NMF with Kullback-Leibler divergence[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2015, 23(7):1233-1242.
[8] NAHMA L, YONG P C, DAM H H, et al. Convex combination framework for a priori SNR estimation in speech enhancement[C]//Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway, NJ. IEEE, 2017:4975-4979.
[9] 蒋毅, 刘润生, 冯振明. 基于听感知特性的双麦克风近讲语音增强算法[J]. 清华大学学报(自然科学版), 2014(9):1179-1183. (JIANG Y, LIU R S, FENG Z M. Dual-microphone speech enhancement algorithm based on the auditory features for a close-talk system[J]. Journal of Tsinghua University (Science and Technology), 2014, 54(9):1179-1183.)
[10] BAO F, ABDULLA W H. A new ratio mask representation for CASA-based speech enhancement[J]. IEEE/ACM Transactions on Audio, Speech and Language Processing, 2019, 27(1):7-19.
[11] YONG P C, NORDHOLM S, DAM H H, et al. On the optimization of sigmoid function for speech enhancement[C]//Proceedings of the 19th European Signal Processing Conference. Piscataway:IEEE, 2011:211-215.
[12] CHEN Z, HOHMANN V. Online monaural speech enhancement based on periodicity analysis and a priori SNR estimation[J]. IEEE/ACM Transactions on Audio, Speech and Language Processing, 2015, 23(11):1904-1916.
[13] ZHENG C, TAN Z, PENG R, et al. Guided spectrogram filtering for speech dereverberation[J]. Applied Acoustics, 2018, 134(5):154-159.
[14] GAROFOLO J S, LAMEL L F, FISHER W M, et al. TIMIT Acoustic-Phonetic Continuous Speech Corpus[EB/OL].[2019-01-12]. https://catalog.ldc.upenn.edu/LDC93S1.
[15] VARGA A, STEENEKEN H J M. Assessment for automatic speech recognition Ⅱ:NOISEX-92:a database and an experiment to study the effect of additive noise on speech recognition systems[J]. Speech Communication, 1993, 12(3):247-251.
[16] GERKMANN T, HENDRIKS R C. Unbiased MMSE-based noise power estimation with low complexity and low tracking delay[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2012, 20(4):1383-1393.
[17] International Telecommunications Union (ITU). Perceptual Evaluation of Speech Quality (PESQ):an objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs[EB/OL].[2019-01-12]. https://www.itu.int/rec/T-REC-P.862-200102-I/en.
[18] TAAL C H, HENDRIKS R C, HEUSDENS R, et al. An algorithm for intelligibility prediction of time-frequency weighted noisy speech[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2011, 19(7):2125-2136.
[19] LOIZOU P C, KIM G. Reasons why current speech-enhancement algorithms do not improve speech intelligibility and suggested solutions[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2011, 19(1):47-56.

基于掩蔽估计与优化的单通道语音增强算法

Monaural speech enhancement algorithm based on mask estimation and optimization

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	尤昕源, 王恒. 基于门控膨胀卷积循环网络的单声道语音增强[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1317-1324.
[2]	高建清, 屠彦辉, 马峰, 付中华. 基于渐进比率掩蔽目标的自适应噪声估计方法[J]. 《计算机应用》唯一官方网站, 2023, 43(4): 1303-1308.
[3]	金玉堂, 王以松, 王丽会, 赵鹏利. 基于多尺度阶梯时频Conformer GAN的语音增强算法[J]. 《计算机应用》唯一官方网站, 2023, 43(11): 3607-3615.
[4]	余本年, 詹永照, 毛启容, 董文龙, 刘洪麟. 面向语音增强的双复数卷积注意聚合递归网络[J]. 《计算机应用》唯一官方网站, 2023, 43(10): 3217-3224.
[5]	龙超, 曾庆宁, 罗瀛. 基于噪声抵消与波束形成的小阵语音增强[J]. 计算机应用, 2020, 40(8): 2386-2391.
[6]	吴庆贺, 吴海锋, 沈勇, 曾玉. 工业噪声环境下多麦状态空间模型语音增强算法[J]. 计算机应用, 2020, 40(5): 1476-1482.
[7]	王永彪, 张文喜, 王亚慧, 孔新新, 吕彤. 拉普拉斯分布下的MMSE谱减语音增强算法[J]. 计算机应用, 2020, 40(3): 878-882.
[8]	李艳生, 刘园, 张毅. 基于感知掩蔽的重构非负矩阵分解单通道语音增强算法[J]. 计算机应用, 2019, 39(3): 894-898.
[9]	蒋茂松, 王冬霞, 牛芳琳, 曹玉东. 稀疏正则非负矩阵分解的语音增强算法[J]. 计算机应用, 2018, 38(4): 1176-1180.
[10]	徐文超, 王光艳, 陈雷. 改进的变步长最小均方误差电子耳蜗语音增强算法[J]. 计算机应用, 2017, 37(4): 1212-1216.
[11]	阳帆, 严迪群, 徐宏伟, 王让定, 金超, 向立. 基于噪声一致性的数字语音异源拼接篡改检测算法[J]. 计算机应用, 2017, 37(12): 3452-3457.
[12]	汪浩然, 夏克文, 任苗苗, 李绰. 结合PCA及字典学习的高光谱图像自适应去噪方法[J]. 计算机应用, 2016, 36(12): 3411-3417.
[13]	马金龙, 曾庆宁, 胡丹, 龙超, 谢先明. 基于麦克风小阵的多噪声环境语音增强算法[J]. 计算机应用, 2015, 35(8): 2341-2344.
[14]	刘艳, 倪万顺. 基于子带谱熵的仿生小波语音增强[J]. 计算机应用, 2015, 35(3): 868-871.
[15]	蔡宇郝程鹏侯朝焕. 采用子带谱减法的语音增强[J]. 计算机应用, 2014, 34(2): 567-571.