Journal of Computer Applications ›› 2019, Vol. 39 ›› Issue (10): 3065-3070.DOI: 10.11772/j.issn.1001-9081.2019030486

• Virtual reality and multimedia computing • Previous Articles     Next Articles

Monaural speech enhancement algorithm based on mask estimation and optimization

GE Wanying, ZHANG Tianqi   

  1. School of Communication and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
  • Received:2019-03-25 Revised:2019-06-20 Online:2019-07-05 Published:2019-10-10
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61671095, 61702065, 61701067, 61771085), the Project of Key Laboratory of Signal and Information Processing of Chongqing (CSTC2009CA2003), the Chongqing Graduate Research and Innovation Project (CYS17219), the Research Project of Chongqing Educational Commission (KJ1600427, KJ1600429).

基于掩蔽估计与优化的单通道语音增强算法

葛宛营, 张天骐   

  1. 重庆邮电大学 通信与信息工程学院, 重庆 400065
  • 通讯作者: 张天骐
  • 作者简介:葛宛营(1994-),男,河南三门峡人,硕士研究生,主要研究方向:语音信号处理、语音增强;张天骐(1971-),男,四川眉山人,教授,博士,主要研究方向:扩频通信、盲信号处理、语音信号处理。
  • 基金资助:
    国家自然科学基金资助项目(61671095,61702065,61701067,61771085);信号与信息处理重庆市市级重点实验室建设项目(CSTC2009CA2003);重庆市研究生科研创新项目(CYS17219);重庆市教育委员会科研项目(KJ1600427,KJ1600429)。

Abstract: Monaural speech enhancement algorithms obtain enhanced speech by estimating and negating the noise components in speech with noise. However, the over-estimation and the error of the introduction to make up the over-estimation of noise power make detrimental effect on the enhanced speech. To constrain the distortion caused by noise over-estimation, a time-frequency mask estimation and optimization algorithm based on Computational Auditory Scene Analysis (CASA) was proposed. Firstly, Decision Directed (DD) algorithm was used to estimate the priori Signal-to-Noise Ratio (SNR) and calculate the initial mask. Secondly, the Inter-Channel Correlation (ICC) factor between noise and speech with noise in each Gammatone filterbank channel was used to calculate the noise presence probability, the new noise estimation was obtained by the probability combining with the power spectrum of speech with noise, and the over-estimation of the primary estimated noise was decreased. Thirdly, the initial mask was iterated by the optimization algorithm to reduce the error caused by the noise over-estimation and raise the target speech components in the mask, and the new mask was obtained when the iteration stopped with the conditions met. Finally, the optimization method was used to optimize the estimated mask. The enhanced speech was composed by using the new mask. Experimental results demonstrate that the new mask has higher Perceptual Evaluation of Speech Quality (PESQ) and Short-Time Objective Intelligibility measure (STOI) values of the enhanced speech in comparison with the mask before optimization, improving the intelligibility and listening feeling of speech.

Key words: computational auditory scene analysis, speech enhancement, time-frequency mask, noise estimation, mask optimization, speech intelligibility

摘要: 单通道语音增强算法通过从带噪语音中估计并抑制噪声成分来得到增强语音。然而,噪声估计算法在计算时存在过估现象,导致部分估计噪声能量值比实际值大。尽管可以通过补偿消去这些过估值,但引入的误差同样会降低增强语音的整体质量。针对此问题,提出一种基于计算听觉场景分析(CASA)的时频掩蔽估计与优化算法。首先,通过直接判决(DD)算法估计先验信噪比(SNR)并计算初始掩蔽;其次,利用噪声与带噪语音在Gammatone频带内的互相关(ICC)系数来计算噪声的存在概率,结合带噪语音能量谱得到新的噪声估计,减少原估计噪声中的过估成分;然后,利用优化算法对初始掩蔽进行迭代处理以减少其中因噪声过估而存在的误差并增加其中的目标语音成分,在满足条件后停止迭代并得到新的掩蔽;最后,利用新的掩蔽合成增强语音。实验结果表明在不同的背景噪声下,相比优化前,新的掩蔽使增强语音获得了较高的主观语音质量(PESQ)和语音可懂度(STOI)值,提升了语音听感与可懂度。

关键词: 计算听觉场景分析, 语音增强, 时频掩蔽, 噪声估计, 掩蔽优化, 语音可懂度

CLC Number: