《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (4): 1303-1308.DOI: 10.11772/j.issn.1001-9081.2022030384

• 多媒体计算与计算机仿真 • 上一篇    

基于渐进比率掩蔽目标的自适应噪声估计方法

高建清1(), 屠彦辉1, 马峰1, 付中华2   

  1. 1.科大讯飞股份有限公司,合肥 230088
    2.西安讯飞超脑信息技术有限公司,西安 710000
  • 收稿日期:2022-03-30 修回日期:2022-09-05 接受日期:2022-09-05 发布日期:2023-04-11 出版日期:2023-04-10
  • 通讯作者: 高建清
  • 作者简介:屠彦辉(1990—),男,安徽六安人,工程师,博士,CCF会员,主要研究方向:语音增强、语音识别;
    马峰(1986—),男,安徽合肥人,工程师,硕士,主要研究方向:语音增强;
    付中华(1977—),男,湖北十堰人,副教授,博士,CCF会员,主要研究方向:语音与音频信号处理、声纹识别。
  • 基金资助:
    科技创新2030?“新一代人工智能”重大项目(2018AAA0102200)

Progressive ratio mask-based adaptive noise estimation method

Jianqing GAO1(), Yanhui TU1, Feng MA1, Zhonghua FU2   

  1. 1.iFLYTEK Company Limited,Hefei Anhui 230088,China
    2.Xi’an iFLYTEK Hyper?brain Information Technology Company Limited,Xi’an Shaanxi 710000,China
  • Received:2022-03-30 Revised:2022-09-05 Accepted:2022-09-05 Online:2023-04-11 Published:2023-04-10
  • Contact: Jianqing GAO
  • About author:TU Yanhui, born in 1990, Ph. D., engineer. His research interests include speech enhancement, speech recognition.
    MA Feng, born in 1986, M. S., engineer. His research interests include speech enhancement.
    FU Zhonghua, born in 1977, Ph. D., associate professor. His research interests include speech and audio signal processing, voiceprint recognition.
  • Supported by:
    Technological Innovation 2030-“New Generation Artificial Intelligence” Major Project(2018AAA0102200)

摘要:

基于深度学习的语音增强算法的性能通常优于传统的基于噪声抑制的语音增强算法。然而当训练数据和测试数据之间存在不匹配时,基于深度学习的语音增强算法通常无法正常工作。针对上述问题,提出一种新的基于渐进比率掩蔽(PRM)的自适应噪声估计(PRM-ANE)方法,并把它作为语音识别系统的预处理方法。所提方法综合利用了具有帧级别的噪声跟踪能力的改进最小统计量控制递归平均(IMCRA)算法和具有学习噪声和语音之间复杂非线性映射关系的渐进学习算法这两种算法。首先,使用二维卷积神经网络(2D-CNN)学习随信噪比(SNR)增加的PRM;其次,通过传统的帧级语音增强算法组合句子级估计的PRM,进行语音增强;最后,将基于多级别信息融合的增强语音直接作为语音识别系统的输入,从而提高识别系统性能。在CHiME-4真实测试集上的实验结果表明,所提方法可以实现7.42%的相对字识别错误率(WER),与IMCRA语音增强方法相比下降了51.41%,可见所提方法能够有效提升下游识别任务的性能。

关键词: 语音增强, 深度学习, 渐进比率掩蔽, 语音识别, CHiME-4比赛

Abstract:

Deep learning based speech enhancement algorithms typically perform better than the traditional noise suppression based speech enhancement algorithms. However, deep learning based speech enhancement algorithms usually do not work well when there exists mismatch between training data and test data. Aiming at the above problem, a novel Progressive Ratio Mask (PRM)-based Adaptive Noise Estimation (PRM-ANE) method was proposed, and this method was used for the preprocessing of the speech recognition system. In the method, Improved Minima Controlled Recursive Averaging (IMCRA) algorithm with frame-level noise tracking capability and utterance-level deep progressive learning algorithm nonlinear interactions between speech and noise were used comprehensively. Firstly, two Dimensional-Convolutional Neural Network (2D-CNN) was adopted to learn PRM, which increased with the increase of Signal-to-Noise Ratio (SNR). Then, the PRMs at sentence level were combined by the conventional frame-level speech enhancement algorithm to perform speech enhancement. Finally, the enhanced speech based on the multi-level information fusion was directly fed into speech recognition system to improve the performance of the system. Experimental results on the CHiME-4 real test set show that the proposed method can achieve a relative Word Error Rate (WER) of 7.42%, which is 51.41% lower than that of IMCRA speech enhancement method. Experimental results show that the proposed enhancement method can effectively improve the performance of downstream recognition tasks.

Key words: speech enhancement, deep learning, Progressive Ratio Mask (PRM), speech recognition, CHiME-4 challenge

中图分类号: