基于发音特征的汉语声调建模方法及其在汉语语音识别中的应用

计算机应用 ›› 2013, Vol. 33 ›› Issue (10): 2939-2944.

基于发音特征的汉语声调建模方法及其在汉语语音识别中的应用

晁浩¹,²,杨占磊²,刘文举²

1. 河南理工大学计算机科学与技术学院，河南焦作 454000
2. 模式识别国家重点实验室(中国科学院自动化研究所)，北京100190

收稿日期:2013-04-12 修回日期:2013-06-07 发布日期:2013-11-01 出版日期:2013-10-01
通讯作者: 晁浩
作者简介:
晁浩（1981-），男，河南许昌人，讲师，博士，主要研究方向:语音识别；杨占磊（1984-），男，河北石家庄人，助理研究员，博士，主要研究方向:语音识别；刘文举（1960-），男，北京人，研究员，博士生导师，博士，主要研究方向:语音识别、语音增强、计算听觉场景分析。
基金资助:
国家自然科学基金资助项目

Improved tone modeling by exploiting articulatory features for Mandarin speech recognition

CHAO Hao¹,²,YANG Zhanlei¹,LIU Wenju¹

1. National Laboratory of Pattern Recognition, (Institute of Automation, Chinese Academy of Sciences), Beijing 100190,China
2. School of Computer Science and Technology, Henan Polytechnic University, Jiaozuo Henan 454000,China;

Received:2013-04-12 Revised:2013-06-07 Online:2013-11-01 Published:2013-10-01
Contact: CHAO Hao

摘要/Abstract

摘要： 发音特征表征了语音的发音方式信息，能够辅助传统的韵律特征改善声调建模的精度。在分析汉语声韵母发音特点的基础上，将发音方式划分为19类，并提出利用阶层式多层感知器计算语音信号属于各类的后验概率，作为发音特征。之后，将发音特征与传统的韵律特征一起用于声调建模。实验结果显示，加入发音特征后，在三种不同的建模方法下声调识别的准确率提升约5%。将声调模型融入大词表连续语音识别系统后，汉字错误率有了明显的下降

关键词: 语音识别, 声调建模, 发音特征, 阶层式多层感知机分类器

Abstract: Articulatory features, which represent the articulatory information, can help prosodic features to improve the performance of tone recognition. In this paper, a set of 19 pronunciation categories was given according to the pronunciation characteristics of initials and finals. Besides, 19 articulatory tandem features, which are the posteriors of speech signal belonging to the 19 pronunciation categories, were obtained by hierarchical multilayer perceptron classifiers. Then these articulatory tandem features, as well as prosodic features, were used for tone modeling. Tone recognition experiments of three kinds of tone models indicate that about 5% absolute increase of accuracy can be achieved when using both articulatory features and prosodic features. When the proposed tone model is integrated into LVSCR (Large Vocabulary Continuous Speech Recognition) system, the character error rate is reduced significantly.

Key words: speech recognition, tone modeling, Articulatory Feature (AF), Hierarchical multilayer perceptron classifier

中图分类号:

TP391.42

晁浩杨占磊刘文举. 基于发音特征的汉语声调建模方法及其在汉语语音识别中的应用[J]. 计算机应用, 2013, 33(10): 2939-2944.

CHAO Hao YANG Zhanlei LIU Wenju. Improved tone modeling by exploiting articulatory features for Mandarin speech recognition[J]. Journal of Computer Applications, 2013, 33(10): 2939-2944.

[1]	赖华, 孙童, 王文君, 余正涛, 高盛祥, 董凌. 多模态特征的越南语语音识别文本标点恢复[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 418-423.
[2]	高建清, 屠彦辉, 马峰, 付中华. 基于渐进比率掩蔽目标的自适应噪声估计方法[J]. 《计算机应用》唯一官方网站, 2023, 43(4): 1303-1308.
[3]	刘聪, 万根顺, 高建清, 付中华. 基于韵律特征辅助的端到端语音识别方法[J]. 《计算机应用》唯一官方网站, 2023, 43(2): 380-384.
[4]	柏财通, 崔翛龙, 郑会吉, 李爱. 基于自监督知识迁移的鲁棒性语音识别技术[J]. 《计算机应用》唯一官方网站, 2022, 42(10): 3217-3223.
[5]	陈玉娜, 史晓东. 通过标点恢复提高机器同传效果[J]. 计算机应用, 2020, 40(4): 972-977.
[6]	刘伟波, 曾庆宁, 卜玉婷, 郑展恒. 基于双微阵列与卷积神经网络的语音识别方法[J]. 计算机应用, 2019, 39(11): 3268-3273.
[7]	姚煜, RYAD Chellali. 基于双向长短时记忆联结时序分类和加权有限状态转换器的端到端中文语音识别系统[J]. 计算机应用, 2018, 38(9): 2495-2499.
[8]	解本铭, 韩明明, 张攀, 张威. 飞机牵引车语音识别的动态时间规整优化算法[J]. 计算机应用, 2018, 38(6): 1771-1776.
[9]	曹晶晶, 许洁萍, 邵聖淇. 多噪声环境下的层级语音识别模型[J]. 计算机应用, 2018, 38(6): 1790-1794.
[10]	秦楚雄, 张连海. 低资源语音识别中融合多流特征的卷积神经网络声学建模方法[J]. 计算机应用, 2016, 36(9): 2609-2615.
[11]	刘金刚, 周翊, 马永保, 刘宏清. 用于自动语音识别系统的切换语音功率谱估计算法[J]. 计算机应用, 2016, 36(12): 3369-3373.
[12]	晁浩, 宋成, 彭维平. 基于发音特征的声效相关鲁棒语音识别算法[J]. 计算机应用, 2015, 35(1): 257-261.
[13]	晁浩杨占磊刘文举. 汉语语音识别中基于音节的声学模型改进算法[J]. 计算机应用, 2013, 33(06): 1742-1745.
[14]	周阿转俞一彪. 采用特征空间随机映射的鲁棒性语音识别[J]. 计算机应用, 2012, 32(07): 2070-2073.
[15]	李伟吴及吕萍. 基于前后向语言模型的语音识别词图生成算法[J]. 计算机应用, 2010, 30(10): 2563-2566.