Journal of Computer Applications ›› 2018, Vol. 38 ›› Issue (6): 1790-1794.DOI: 10.11772/j.issn.1001-9081.2017112678

Previous Articles     Next Articles

Hierarchical speech recognition model in multi-noise environment

CAO Jingjing, XU Jieping, SHAO Shengqi   

  1. School of Information, Renmin University of China, Beijing 100872, China
  • Received:2017-11-14 Revised:2018-01-09 Online:2018-06-10 Published:2018-06-13
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61672523).

多噪声环境下的层级语音识别模型

曹晶晶, 许洁萍, 邵聖淇   

  1. 中国人民大学 信息学院, 北京 100872
  • 通讯作者: 许洁萍
  • 作者简介:曹晶晶(1993-),女,安徽马鞍山人,硕士研究生,主要研究方向:语音识别;许洁萍(1966-),女,黑龙江牡丹江人,副教授,博士,CCF会员,主要研究方向:音频信息处理;邵聖淇(1993-),男,辽宁沈阳人,硕士研究生,主要研究方向:语音识别。
  • 基金资助:
    国家自然科学基金资助项目(61672523)。

Abstract: Focusing on the issue of speech recognition in multi-noise environment, a new hierarchical speech recognition model considering environmental noise as the context of speech recognition was proposed. The proposed model was composed of two layers of noisy speech classification model and acoustic model under specific noise environment. The difference between training data and test data was reduced by noisy speech classification model, which eliminated the limitation of noise stability required in feature space research and solved the disadvantage of low recognition rate caused by traditional multi-type training under certain noise environment. Furthermore, a Deep Neural Network (DNN) was used for modeling of acoustic model, which could further enhance the ability of acoustic model to distinguish noise and speech, and the noise robustness of speech recognition in model space was improved. In the experiment, the proposed model was compared with the benchmark model obtained by multi-type training. The experimental results show that, the proposed hierarchical speech recognition model has relatively reduced the Word Error Rate (WER) by 20.3% compared with the traditional benchmark model. The proposed hierarchical speech recognition model is helpful to enhance the noise robustness of speech recognition.

Key words: speech recognition, noise-robustness, environmental noise, acoustic model, Deep Neural Network (DNN)

摘要: 针对多噪声环境下的语音识别问题,提出了将环境噪声作为语音识别上下文考虑的层级语音识别模型。该模型由含噪语音分类模型和特定噪声环境下的声学模型两层组成,通过含噪语音分类模型降低训练数据与测试数据的差异,消除了特征空间研究对噪声稳定性的限制,并且克服了传统多类型训练在某些噪声环境下识别准确率低的弊端,又通过深度神经网络(DNN)进行声学模型建模,进一步增强声学模型分辨噪声的能力,从而提高模型空间语音识别的噪声鲁棒性。实验中将所提模型与多类型训练得到的基准模型进行对比,结果显示所提层级语音识别模型较该基准模型的词错率(WER)相对降低了20.3%,表明该层级语音识别模型有利于增强语音识别的噪声鲁棒性。

关键词: 语音识别, 噪声鲁棒性, 环境噪声, 声学模型, 深度神经网络

CLC Number: