计算机应用 ›› 2013, Vol. 33 ›› Issue (10): 2939-2944.

• 多媒体技术 • 上一篇    下一篇

基于发音特征的汉语声调建模方法及其在汉语语音识别中的应用

晁浩1,2,杨占磊2,刘文举2   

  1. 1. 河南理工大学 计算机科学与技术学院,河南 焦作 454000
    2. 模式识别国家重点实验室(中国科学院自动化研究所),北京100190
  • 收稿日期:2013-04-12 修回日期:2013-06-07 出版日期:2013-10-01 发布日期:2013-11-01
  • 通讯作者: 晁浩
  • 作者简介: 
    晁浩(1981-),男,河南许昌人,讲师,博士,主要研究方向:语音识别;杨占磊(1984-),男,河北石家庄人,助理研究员,博士,主要研究方向:语音识别;刘文举(1960-),男,北京人,研究员,博士生导师,博士,主要研究方向:语音识别、语音增强、计算听觉场景分析。
  • 基金资助:
    国家自然科学基金资助项目

Improved tone modeling by exploiting articulatory features for Mandarin speech recognition

CHAO Hao1,2,YANG Zhanlei1,LIU Wenju1   

  1. 1. National Laboratory of Pattern Recognition, (Institute of Automation, Chinese Academy of Sciences), Beijing 100190,China
    2. School of Computer Science and Technology, Henan Polytechnic University, Jiaozuo Henan 454000,China;
  • Received:2013-04-12 Revised:2013-06-07 Online:2013-11-01 Published:2013-10-01
  • Contact: CHAO Hao

摘要: 发音特征表征了语音的发音方式信息,能够辅助传统的韵律特征改善声调建模的精度。在分析汉语声韵母发音特点的基础上,将发音方式划分为19类,并提出利用阶层式多层感知器计算语音信号属于各类的后验概率,作为发音特征。之后,将发音特征与传统的韵律特征一起用于声调建模。实验结果显示,加入发音特征后,在三种不同的建模方法下声调识别的准确率提升约5%。将声调模型融入大词表连续语音识别系统后,汉字错误率有了明显的下降

关键词: 语音识别, 声调建模, 发音特征, 阶层式多层感知机分类器

Abstract: Articulatory features, which represent the articulatory information, can help prosodic features to improve the performance of tone recognition. In this paper, a set of 19 pronunciation categories was given according to the pronunciation characteristics of initials and finals. Besides, 19 articulatory tandem features, which are the posteriors of speech signal belonging to the 19 pronunciation categories, were obtained by hierarchical multilayer perceptron classifiers. Then these articulatory tandem features, as well as prosodic features, were used for tone modeling. Tone recognition experiments of three kinds of tone models indicate that about 5% absolute increase of accuracy can be achieved when using both articulatory features and prosodic features. When the proposed tone model is integrated into LVSCR (Large Vocabulary Continuous Speech Recognition) system, the character error rate is reduced significantly.

Key words: speech recognition, tone modeling, Articulatory Feature (AF), Hierarchical multilayer perceptron classifier

中图分类号: