Journal of Computer Applications ›› 2020, Vol. 40 ›› Issue (3): 878-882.DOI: 10.11772/j.issn.1001-9081.2019071152

• Virtual reality and multimedia computing • Previous Articles     Next Articles

Speech enhancement algorithm based on MMSE spectral subtraction with Laplacian distribution

WANG Yongbiao1,2, ZHANG Wenxi1,2, WANG Yahui1, KONG Xinxin1, LYU Tong1,2   

  1. 1. Key Laboratory of Computation Optical Imaging Technology, Academy of Opto-Electronics, Chinese Academy of Sciences, Beijing 100094, China;
    2. School of Electronic, Electrical and Communication, University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2019-07-03 Revised:2019-08-30 Online:2020-03-10 Published:2019-09-19
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61605217).

拉普拉斯分布下的MMSE谱减语音增强算法

王永彪1,2, 张文喜1,2, 王亚慧1, 孔新新1, 吕彤1,2   

  1. 1. 中国科学院光电研究院 计算光学成像技术重点实验室, 北京 100094;
    2. 中国科学院大学 电子电气与通信工程学院, 北京 100049
  • 通讯作者: 张文喜
  • 作者简介:王永彪(1995-),男,山东德州人,硕士研究生,主要研究方向:语音信号处理;张文喜(1979-),男,安徽宿州人,研究员,博士,主要研究方向:激光测量、相干成像;王亚慧(1990-),女,山东菏泽人,博士研究生,主要研究方向:语音信号处理;孔新新(1988-),男,山东曲阜人,博士研究生,主要研究方向:光学精密测量;吕彤(1995-),女,北京人,硕士研究生,主要研究方向:光学精密测量。
  • 基金资助:
    国家自然科学基金资助项目(61605217)。

Abstract: A Minimum Mean Square Error (MMSE) speech enhancement algorithm based on Laplacian distribution was proposed to solve the problem of noise residual and speech distortion of speech enhanced by the spectral subtraction algorithm based on Gaussian distribution. Firstly, the original noisy speech signal was framed and windowed, and the Fourier transform was performed on the signal of each processed frame to obtain the Discrete-time Fourier Transform (DFT) coefficient of short-term speech. Secondly, the noisy frame detection was performed to update the noise estimation by calculating the logarithmic spectrum energy and spectral flatness of each frame. Thirdly, based on the assumption of Laplace distribution of speech DFT coefficient, the optimal spectral subtraction coefficient was derived under the MMSE criterion, and the spectral subtraction with the obtained coefficient was performed to obtain the enhanced signal spectrum. Finally, the enhanced signal spectrum was subjected to inverse Fourier transform and framing to obtain the enhanced speech. The experimental results show that the Signal-to-Noise Ratio (SNR) of the speech enhanced by the proposed algorithm is increased by 4.3 dB on average, and has 2 dB improvement compared with that of the speech enhanced by the over-subtraction method. In the term of Perceptual Evaluation of Speech Quality (PESQ) score, compared with that of over-subtraction method, the average score of the proposed algorithm has a 10% improvement. The proposed algorithm has better noise suppression and less speech distortion, and has a significant improvement in SNR and PESQ evaluation standards.

Key words: speech enhancement, spectrum subtraction, Minimum Mean Square Error (MMSE), short-term logarithmic spectrum, spectral flatness

摘要: 针对基于高斯分布的谱减语音增强算法,增强语音出现噪声残留和语音失真的问题,提出了基于拉普拉斯分布的最小均方误差(MMSE)谱减算法。首先,对原始带噪语音信号进行分帧、加窗处理,并对处理后每帧的信号进行傅里叶变换,得到短时语音的离散傅里叶变换(DFT)系数;然后,通过计算每一帧的对数谱能量及谱平坦度,进行噪声帧检测,更新噪声估计;其次,基于语音DFT系数服从拉普拉斯分布的假设,在最小均方误差准则下,求解最佳谱减系数,使用该系数进行谱减,得到增强信号谱;最后,对增强信号谱进行傅里叶逆变换、组帧,得到增强语音。实验结果表明,使用所提算法增强的语音信噪比(SNR)平均提高了4.3 dB,与过减法相比,有2 dB的提升;在语音质量感知评估(PESQ)得分方面,与过减法相比,所提算法平均得分有10%的提高。该算法有更好的噪声抑制能力和较小的语音失真,在SNR和PESQ评价标准上有较大提升。

关键词: 语音增强, 谱减, 最小均方误差, 短时对数谱, 谱平坦度

CLC Number: