Journal of Computer Applications ›› 2015, Vol. 35 ›› Issue (1): 257-261.DOI: 10.11772/j.issn.1001-9081.2015.01.0257

Previous Articles     Next Articles

Robust speech recognition algorithm based on articulatory features for vocal effort variability

CHAO Hao, SONG Cheng, PENG Weiping   

  1. College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo Henan 454000, China
  • Received:2014-08-19 Revised:2014-09-30 Online:2015-01-01 Published:2015-01-26

基于发音特征的声效相关鲁棒语音识别算法

晁浩, 宋成, 彭维平   

  1. 河南理工大学 计算机科学与技术学院, 河南 焦作454000
  • 通讯作者: 晁浩
  • 作者简介:晁浩(1981-),男,河南许昌人,讲师,博士,主要研究方向:语音识别、模式识别;宋成(1980-),男,河南信阳人,讲师,博士,主要研究方向:信息安全;彭维平(1979-),男,湖北天门人,副教授,博士,主要研究方向:智能计算、信息安全.
  • 基金资助:

    国家自然科学基金资助项目(61300124);河南省基础与前沿技术研究计划资助项目(132300410332);河南省科技厅科技攻关计划项目(132102210123);河南省教育厅科技攻关计划项目(13A520321).

Abstract:

Aiming at the problem of robust speech recognition for Vocal Effort (VE) variability, a speech recognition algorithm based on multi-model framework was presented. Firstly, changes of acoustic characteristics under different VE modes, as well as influence of these changes on speech recognition, were analyzed. Secondly, a VE detection method based on Gaussian Mixture Model (GMM) was proposed. Finally, the special acoustic models were trained to recognize whisper speech if the result of VE detection was whisper mode; otherwise articularoty features, in company with spectrum features, were introduced to recognize speech of the remaining four VE modes. The experiments conducted on isolated-word recognition show that significant improvement of recognition accuracy can be achieved when using proposed method: compared with the baseline system, the mixed corpus training method and the Maximum Likelihood Linear Regression (MLLR) adaptation method, the average character error rate of five VE modes is reduced by 26.69%,14.51% and 15.30% respectively. These results prove that articularoty feature is more robust than the traditional spectrum feature when confronting VE variability, and the multi-model framework is an efficient method for robust speech recognition related to VE variability.

Key words: speech recognition, Vocal Effort (VE), articulatory feature, multi-model framework, isolated-word

摘要:

针对声效(VE)相关的语音识别鲁棒性问题,提出了基于多模型框架的语音识别算法.首先,分析了不同声效模式下语音信号的声学特性以及声效变化对语音识别精度的影响;然后,提出了基于高斯混合模型(GMM)的声效模式检测方法;最后,根据声效检测的结果,训练专门的声学模型用于耳语音识别,而将发音特征与传统的谱特征一起用于其余4种声效模式的语音识别.基于孤立词识别的实验结果显示,采用所提方法后语音识别准确率有了明显的提高:与基线系统相比,所提方法5种声效的平均字错误率降低了26.69%;与声学模型混合语料训练方法相比,平均字错误率降低了14.51%;与最大似然线性回归(MLLR)自适应方法相比,平均字错误率降低了15.30%.实验结果表明:与传统谱特征相比发音特征对于声效变化更具鲁棒性,而多模型框架是解决声效相关的语音识别鲁棒性问题的有效方法.

关键词: 语音识别, 声效, 发音特征, 多模型框架, 孤立词

CLC Number: