采用特征空间随机映射的鲁棒性语音识别

doi:10.3724/SP.J.1087.2012.02070

计算机应用 ›› 2012, Vol. 32 ›› Issue (07): 2070-2073.DOI: 10.3724/SP.J.1087.2012.02070

采用特征空间随机映射的鲁棒性语音识别

周阿转,俞一彪

苏州大学语音技术研究室，江苏苏州215006

收稿日期:2011-12-13 修回日期:2012-02-16 发布日期:2012-07-05 出版日期:2012-07-01
通讯作者: 周阿转
作者简介:周阿转（1981-），女，陕西西安人，硕士研究生，主要研究方向：语音信号处理、语音识别；俞一彪（1962-），男，江苏无锡人，教授，博士，主要研究方向：语音信号处理、信息隐藏、多媒体处理。

Robust speech recognition by adopting random projection in feature space

ZHOU A-zhuan,YU Yi-biao

Speech Technology Laboratory, Soochow University, Suzhou Jiangsu 215006, China

Received:2011-12-13 Revised:2012-02-16 Online:2012-07-05 Published:2012-07-01
Contact: ZHOU A-zhuan

摘要/Abstract

摘要： 针对语音识别性能受噪声干扰而显著降低的问题，提出一种采用特征空间随机映射（RP）的鲁棒性语音语音识别方法，并应用于汽车驾驶环境下的语音识别系统。首先，将原始语音特征参数采用随机矩阵线性映射到新的特征空间，使新的特征参数以最大概率保持原始特征之间距离的同时更加接近于高斯分布；然后训练隐马尔可夫模型(HMM)，测试时结合多数投票表决方法对初始模式匹配结果进行判决并得到最终语音识别结果。采用日本情报处理学会车载环境下语音识别数据库CENSREC-2进行实验分析，结果表明，随机映射特征使得汽车驾驶环境下的语音识别性能有了很大改善。

关键词: 语音识别, 随机映射, 多数投票表决, CENSREC-2

Abstract: To improve speech recognition in noisy environment such as in driving car, a new method which adopted Random Projection (RP) of feature space was proposed in this paper. First, original speech feature coefficients were projected into a new feature space using random matrixes to make the new coefficients have distribution more similar to the Gaussian but preserve the original distances among features with maximum probability. Then Hidden Markov Model (HMM) of every word was trained. In the test stage, the initial pattern matching results were further processed with majority voting strategy then to make a final speech recognition decision. The experimental results based on speech recognition database CENSREC-2 of Japan Information Processing Association demonstrate the effectiveness of random projection of feature space, which greatly improves the speech recognition performance in driving car.

Key words: speech recognition, Random Projection (RP), majority voting, CENSREC-2

中图分类号:

TN912.34

周阿转俞一彪. 采用特征空间随机映射的鲁棒性语音识别[J]. 计算机应用, 2012, 32(07): 2070-2073.

ZHOU A-zhuan YU Yi-biao. Robust speech recognition by adopting random projection in feature space[J]. Journal of Computer Applications, 2012, 32(07): 2070-2073.

参考文献

［1］ HUANG LIANG-SHENG, YANG C-H. A novel approach to robust speech endpoint detection in car environments ［C］// IEEE International Conference on Acoustics, Speech, and Signal Processing. Piscataway: IEEE, 2000: 1751-1754.

［2］ MA LONGHUA, WEI SHANGGUAN, ZANG YIHUA. Design of speech control system in car noise environments ［C］// 2007 International Conference on Mechatronics and Automation. Piscataway: IEEE, 2007: 3475-3480.

［3］ AFIFY M, SIOHAN O. Sequential estimation with optimal forgetting for robust speech recognition ［J］. IEEE Transactions on Speech and Audio Processing, 2004, 12(1): 19- 26.

［4］ LI WEIFENG, ITOU K, TAKEDA K, et al. Adaptive regression based framework for in-car speech recognition ［C］// 2009 IEEE International Conference on Acoustics, Speech, and Signal Processing. Piscataway: IEEE, 2006: 14-19.

［5］姜莹,俞一彪.采用特征分类直方图均衡化的鲁棒性语音识别［J］.信号处理，2011,27(6):896-890.

［6］ MORENO P J, RAJ B, STERN R M. A vector Taylor series approach for environment-independent speech recognition ［C］// 1995 IEEE International Conference on Acoustics, Speech, and Signal Processing. Piscataway: IEEE, 1995: 733-736.

［7］ GALES M J F, YOUNG S J. Robust continuous speech recognition using parallel model combination ［J］. IEEE Transactions on Speech and Audio Processing, 1996, 4(5): 352-359.

［8］ ABOLHASSANI A H, SELOUANI S A, O'SHAUGHNESSY D. Speech enhancement using PCA and variance of the reconstruction error in distributed speech recognition ［C］// 2007 IEEE Workshop on Automatic Speech Recognition and Understanding. Piscataway: IEEE, 2007: 19-23.

［9］ KAJAREKAR S S, YEGNANARAYANA B, HERMANSKY H. A study of two dimensional linear discriminants for ASR ［C］// IEEE International Conference on Acoustics, Speech, and Signal Processing. Piscataway: IEEE, 2001: 137-140.

［10］ HYUNSIN P, TAKIGUCHI T, ARIKI Y. Integration of phoneme-subspaces using ICA for speech feature extraction and recognition ［C］// HSCMA 2008: Hands-Free Speech Communication and Microphone Arrays. Piscataway: IEEE, 2008: 148-151.

［11］ DIACONIS P, FREEDMAN D. Asymptotics of graphical projection pursuit [J]. Annals of Statistics, 1984, 12(3): 793-815.

［12］ DASGUPTA S. Experiments with random projection ［C］// Pro-ceedings of the 16th Conference on Uncertainty in Artificial Intelligence. San Francisco, CA: Morgan Kaufmann Publishers, 2000: 143-151.

［13］ FERN X Z, BRODLEY C E. Random projection for high dimensional data clustering: A cluster ensemble approach ［C］// Proceedings of the 20th International Conference on Machine learning. Washington, DC: AAAI Press, 1997: 178-185.

［14］ LU XU, XU MINGXING, YANG DALI. Factor analysis and majority voting based speech emotion recognition ［C］// 2010 International Conference on Intelligent System Design and Engineering Application. Changsha, China: ［s.n.］, 2010: 716-720.

［15］ DASGUPTA S, GUPTA A. An elementary proof of the Johnson-Lindenstrauss lemma, TR-99-006 ［R］. Berkeley, CA: International Computer Science Institute, 1999.

［16］ GOEL N, BEBIS G, NEFIAN A. Face recognition experiments with random projection ［C］// Proceedings of the International Society for Optics and Photonics. Bellingham, WA: SPIE, 2005, 5779: 426-437.

［17］ BINGHAM E, MANNILA H. Random projection in dimensionality reduction: Applications to image and text data ［C］// Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2001: 245-250.

[1]	赖华, 孙童, 王文君, 余正涛, 高盛祥, 董凌. 多模态特征的越南语语音识别文本标点恢复[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 418-423.
[2]	高建清, 屠彦辉, 马峰, 付中华. 基于渐进比率掩蔽目标的自适应噪声估计方法[J]. 《计算机应用》唯一官方网站, 2023, 43(4): 1303-1308.
[3]	刘聪, 万根顺, 高建清, 付中华. 基于韵律特征辅助的端到端语音识别方法[J]. 《计算机应用》唯一官方网站, 2023, 43(2): 380-384.
[4]	柏财通, 崔翛龙, 郑会吉, 李爱. 基于自监督知识迁移的鲁棒性语音识别技术[J]. 《计算机应用》唯一官方网站, 2022, 42(10): 3217-3223.
[5]	陈玉娜, 史晓东. 通过标点恢复提高机器同传效果[J]. 计算机应用, 2020, 40(4): 972-977.
[6]	刘伟波, 曾庆宁, 卜玉婷, 郑展恒. 基于双微阵列与卷积神经网络的语音识别方法[J]. 计算机应用, 2019, 39(11): 3268-3273.
[7]	姚煜, RYAD Chellali. 基于双向长短时记忆联结时序分类和加权有限状态转换器的端到端中文语音识别系统[J]. 计算机应用, 2018, 38(9): 2495-2499.
[8]	解本铭, 韩明明, 张攀, 张威. 飞机牵引车语音识别的动态时间规整优化算法[J]. 计算机应用, 2018, 38(6): 1771-1776.
[9]	曹晶晶, 许洁萍, 邵聖淇. 多噪声环境下的层级语音识别模型[J]. 计算机应用, 2018, 38(6): 1790-1794.
[10]	秦楚雄, 张连海. 低资源语音识别中融合多流特征的卷积神经网络声学建模方法[J]. 计算机应用, 2016, 36(9): 2609-2615.
[11]	刘金刚, 周翊, 马永保, 刘宏清. 用于自动语音识别系统的切换语音功率谱估计算法[J]. 计算机应用, 2016, 36(12): 3369-3373.
[12]	晁浩, 宋成, 彭维平. 基于发音特征的声效相关鲁棒语音识别算法[J]. 计算机应用, 2015, 35(1): 257-261.
[13]	晁浩杨占磊刘文举. 基于发音特征的汉语声调建模方法及其在汉语语音识别中的应用[J]. 计算机应用, 2013, 33(10): 2939-2944.
[14]	晁浩杨占磊刘文举. 汉语语音识别中基于音节的声学模型改进算法[J]. 计算机应用, 2013, 33(06): 1742-1745.
[15]	李伟吴及吕萍. 基于前后向语言模型的语音识别词图生成算法[J]. 计算机应用, 2010, 30(10): 2563-2566.

采用特征空间随机映射的鲁棒性语音识别

Robust speech recognition by adopting random projection in feature space

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics