发音错误检测中基于多数据流的Tandem特征方法

doi:10.11772/j.issn.1001-9081.2014.06.1694

计算机应用 ›› 2014, Vol. 34 ›› Issue (6): 1694-1698.DOI: 10.11772/j.issn.1001-9081.2014.06.1694

发音错误检测中基于多数据流的Tandem特征方法

袁桦¹,²,蔡猛¹,²,赵军红³,⁴,⁵,张卫强¹,²,刘加¹,²

1. 清华大学电子工程系,北京100084;
2. 清华信息科学与技术国家实验室(清华大学),北京 100084;
3. 中国科学院电子学研究所,北京100190;
4. 中国科学院大学,北京100190
5. 传感技术国家重点实验室(中国科学院),北京100190;

收稿日期:2013-12-16 修回日期:2014-01-21 出版日期:2014-06-01 发布日期:2014-07-02
通讯作者: 袁桦
作者简介:袁桦(1985-),女,湖北浠水人,博士研究生,主要研究方向:发音错误检测;蔡猛(1987-),男,河北沧州人,博士研究生,主要研究方向:自动语音识别;赵军红(1987-),女,山东菏泽人,博士研究生,主要研究方向:语音合成;张卫强(1979-),男,河北雄县人,助理研究员,博士,主要研究方向:模式识别;刘加(1954-),男,福建福州人,教授,博士,主要研究方向:语音信号处理。
基金资助:
国家自然科学基金资助项目

Multi-stream based Tandem feature method for mispronunciation detection

YUAN Hua¹,²,CAI Meng¹,²,ZHAO Hongjun³,⁴,⁵,ZHANG Weiqiang¹,²,LIU Jia¹,²

1. Department of Electronic Engineering, Tsinghua University, Beijing 100084, China;
2. Tsinghua National Laboratory for Information Science and Technology (Tsinghua University), Beijing 100084, China;
3. University of Chinese Academy of Sciences, Beijing 100190, China
4. Institute of Electronics, Chinese Academy of Sciences, Beijing 100190, China;
5. State Key Laboratory of Transducer Technology (Chinese Academy of Sciences), Beijing 100190, China;

Received:2013-12-16 Revised:2014-01-21 Online:2014-06-01 Published:2014-07-02
Contact: YUAN Hua

摘要/Abstract

摘要：

针对发音错误检测中标注的发音数据资源有限的情况,提出在Tandem系统框架下利用其他数据来提高特征的区分性。以中国人的英语发音为研究对象,选取了相对容易获取的无校正发音数据、母语普通话和母语英语作为辅助数据,实验结果表明,这几种数据都能够有效地提高系统性能,其中无校正数据表现出最好的性能。同时,比较了不同的扩展帧长,以多层神经感知(MLP)和深度神经网络(DNN)作为典型的浅层和深层神经网络,以及Tandem特征的不同结构对系统性能的影响。最后,多数据流融合的策略用于进一步提高系统性能,基于DNN的无校正发音数据流和母语英语数据流合并的Tandem特征取得了最好的性能,与基线系统相比,识别正确率提高了7.96%,错误类型诊断正确率提高了14.71%。

Abstract:

To deal with the under-resourced labeled pronunciation data in mispronunciation detection, some other data were used to improve the discriminability of feature in the framework of Tandem system. Taking Chinese learning of English as object, unlabeled data, native Mandarin data and native English data which can be relatively easily accessed were selected as the assisted data. The experiments show that these types of data can effectively improve the performance of system, and the unlabeled data performs the best. And the effect to system performance was discussed with different length of frame context, the shallow and deep neural network typically represented by Multi-Layer Perception (MLP) and Deep Neural Network (DNN), and different structure of Tandem feature. Finally the strategy of merging multiple data streams was used to further improve the system performance, and the best system performance was achieved by combining the DNN based unlabeled data stream and native English stream. Compared with the baseline system, the recognition accuracy is increased by 7.96%, and the diagnostic accuracy of mispronunciation type is increased by 14.71%.

中图分类号:

TP391.42

袁桦蔡猛赵军红张卫强刘加. 发音错误检测中基于多数据流的Tandem特征方法[J]. 计算机应用, 2014, 34(6): 1694-1698.

YUAN Hua CAI Meng ZHAO Hongjun ZHANG Weiqiang LIU Jia. Multi-stream based Tandem feature method for mispronunciation detection[J]. Journal of Computer Applications, 2014, 34(6): 1694-1698.

参考文献

[1]FRANCO H, NEUMEYER L, RAMOS M, et al. Automatic detection of phone-level mispronunciation for language learning [EB/OL].[2013-10-10]. http://www.speech.sri.com/people/hef/papers/F020.PS.
[2]YOON S Y, HASEGAWA-JOHNSON M, SPROAT R. Landmark-based automated pronunciation error detection [EB/OL].[2013-10-10]. http://www.isle.illinois.edu/sst/pubs/2010/yoon10interspeech.pdf.
[3]WEI S, HU G, HU Y, et al. A new method for mispronunciation detection using support vector machine based on pronunciation space models [J]. Speech Communication, 2009, 51(10): 896-905.
[4]LI H, HUANG S, WANG S, et al. Context-dependent duration modeling with backoff strategy and look-up tables for pronunciation assessment and mispronunciation detection [C]// Proceedings of the 12th Annual Conference of the International Speech Communication Association. Baixas: ISCA, 2011: 1133-1136.
[5]HARRISON A M, LO W K, QIAN X, et al. Implementation of an extended recognition network for mispronunciation detection and diagnosis in computer-assisted pronunciation training [C]// Proceedings of the 2009 Speech and Language Technology in Education Workshop. Baixas: ISCA, 2009: 137-140.
[6]LO W K, ZHANG S, MENG H. Automatic derivation of phonological rules for mispronunciation detection in a computer-assisted pronunciation training system [C]// Proceedings of the 11th Annual Conference of the International Speech Communication Association. Baixas: ISCA, 2010: 765-768.
[7]WANG Y B, LEE L S. Improved approaches of modeling and detecting error patterns with empirical analysis for computer-aided pronunciation training [C]// Proceedings of the 2012 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE Press, 2012: 5049-5052.
[8]STANLEY T, HACIOGLU K. Improving L1-specific phonological error diagnosis in computer assisted pronunciation training [C]// Proceedings of the 13th Annual Conference of the International Speech Communication Association. Baixas: ISCA, 2012: 826-829.
[9]QIAN X, SOONG F K, MENG H M. Discriminative acoustic model for improving mispronunciation detection and diagnosis in Computer-Aided Pronunciation Training (CAPT) [C]// Proceedings of the 11th Annual Conference of the International Speech Communication Association. Baixas: ISCA, 2010: 757-760.
[10]QIAN X, MENG H M, SOONG F K. The use of DBN-HMMs for mispronunciation detection and diagnosis in L2 english to support computer-aided pronunciation training [C]// Proceedings of the 13th Annual Conference of the International Speech Communication Association. Baixas: ISCA, 2012: 774-777.
[11]GASS S M, SELINKER L. Language learning in language transfer [M]. Philadelphia: John Benjamins Publishing Company, 1993: 87-101.
[12]DAHL G E, YU D, DENG L, et al. Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition [J]. IEEE Transactions on Audio, Speech, and Language Processing, 2012, 20(1): 30-42.
[13]MOHAMED A, DAHL G E, HINTON G. Acoustic modeling using deep belief networks [J]. IEEE Transactions on Audio, Speech, and Language Processing, 2012, 20(1): 14-22.
[14]HERMANSKY H, ELLIS D P W, SHARMA S. Tandem connectionist feature extraction for conventional HMM systems [C]// Proceedings of the 2000 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE Press, 2000: 1635-1638.
[15]ZHENG X, WU Z, SHEN B, et al. Investigation of tandem deep belief network approach for phoneme recognition [C]// Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE Press, 2013: 7586-7590.
[16]QIAN Y, LIU J. Articulatory feature based multilingual MLPs for low-resource speech recognition [C]// Proceedings of the 13th Annual Conference of the International Speech Communication Association. Baixas: ISCA, 2012, 3: 2601-2604.
[17]QIAN Y, LIU J. Cross-lingual and ensemble MLPs strategies for low-resource speech recognition [C]// Proceedings of the 13th Annual Conference of the International Speech Communication Association. Baixas: ISCA, 2012, 3: 2581-2584.
[18]TUSKE Z, PINTO J, WILLETT D, et al. Investigation on cross- and multilingual MLP features under matched and mismatched acoustical conditions [C]// Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE Press, 2013: 7349-7353.

发音错误检测中基于多数据流的Tandem特征方法

Multi-stream based Tandem feature method for mispronunciation detection

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	王康, 董元菲. 基于角度间隔嵌入特征的端到端声纹识别模型[J]. 计算机应用, 2019, 39(10): 2937-2941.
[2]	向立, 严迪群, 王让定, 李孝文. 针对多种处理痕迹的数字语音取证算法[J]. 计算机应用, 2019, 39(1): 126-130.
[3]	解本铭, 韩明明, 张攀, 张威. 飞机牵引车语音识别的动态时间规整优化算法[J]. 计算机应用, 2018, 38(6): 1771-1776.
[4]	解本铭韩明明张攀张威. 飞机牵引车语音识别的动态时间规整优化算法[J]. 计算机应用, 0, (): 0-0.
[5]	陈秋菊, 李应. 基于优化正交匹配追踪和深度置信网的声音识别[J]. 计算机应用, 2017, 37(2): 505-511.
[6]	晁浩, 宋成, 彭维平. 基于发音特征的声效相关鲁棒语音识别算法[J]. 计算机应用, 2015, 35(1): 257-261.
[7]	朱国腾孙伟. 基于模板匹配的快速语音关键词检出方法[J]. 计算机应用, 2013, 33(11): 3138-3140.
[8]	晁浩杨占磊刘文举. 基于发音特征的汉语声调建模方法及其在汉语语音识别中的应用[J]. 计算机应用, 2013, 33(10): 2939-2944.
[9]	晁浩杨占磊刘文举. 汉语语音识别中基于音节的声学模型改进算法[J]. 计算机应用, 2013, 33(06): 1742-1745.
[10]	王改良武妍. 基于仿生模式识别理论的声调识别[J]. 计算机应用, 2010, 30(10): 2709-2711.
[11]	那斯尔江·吐尔逊吾守尔·斯拉木. 基于HMM的维吾尔语连续语音识别系统[J]. 计算机应用, 2009, 29(07): 2009-2011.
[12]	刘宗礼曹洁郝元宏. 一种新的特征提取方法及其在模式识别中的应用[J]. 计算机应用, 2009, 29(4): 1032-1035.
[13]	刘勇进史晓东 . 基于HTK的语音识别的并行化研究与实现[J]. 计算机应用, 2009, 29(4): 1052-1055.
[14]	王永生，柴佩琪. 英语语音合成中基于有限泛化法的字素切分规则的机器学习[J]. 计算机应用, 2005, 25(09): 2010-2014.
[15]	马昕，杜利民. 基于小波调制尺度的语音特征参数提取方法[J]. 计算机应用, 2005, 25(06): 1342-1344.