用于自动语音识别系统的切换语音功率谱估计算法

doi:10.11772/j.issn.1001-9081.2016.12.3369

计算机应用 ›› 2016, Vol. 36 ›› Issue (12): 3369-3373.DOI: 10.11772/j.issn.1001-9081.2016.12.3369

用于自动语音识别系统的切换语音功率谱估计算法

刘金刚, 周翊, 马永保, 刘宏清

重庆邮电大学通信与信息工程学院, 重庆 400065

收稿日期:2016-05-25 修回日期:2016-07-12 发布日期:2016-12-08 出版日期:2016-12-10
通讯作者: 刘金刚
作者简介:刘金刚(1991-),男,山东诸城人,硕士研究生,主要研究方向:语音信号处理、语音增强;周翊(1974-),男,四川成都人,教授,博士,主要研究方向:自适应滤波、语音信号处理;马永保(1991-),男,甘肃武威人,硕士研究生,主要研究方向:语音信号处理、语音增强;刘宏清(1980-),男,黑龙江佳木斯人,教授,博士,主要研究方向:稀疏信号处理,阵列信号处理。
基金资助:
国家自然科学基金资助项目（61501072）；重庆市科委自然科学基金资助项目（cstc2015jcyjA40027）。

Estimation algorithm of switching speech power spectrum for automatic speech recognition system

LIU Jingang, ZHOU Yi, MA Yongbao, LIU Hongqing

School of Communication and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China

Received:2016-05-25 Revised:2016-07-12 Online:2016-12-08 Published:2016-12-10
Supported by:
This work is partially supported by the National Natural Science Foundation of China (61501072), the Natural Science Foundation of Chongqing Science and Technology Commission (cstc2015jcyjA40027).

摘要/Abstract

摘要： 针对语音识别系统在噪声环境下不能保持很好鲁棒性的问题，提出了一种切换语音功率谱估计算法。该算法假设语音的幅度谱服从Chi分布，提出了一种改进的基于最小均方误差（MMSE）的语音功率谱估计算法。然后，结合语音存在的概率（SPP），推导出改进的基于语音存在概率的MMSE估计器。接下来，将改进的MSME估计器与传统的维纳滤波器结合。在噪声干扰比较大时，使用改进的MMSE估计器来估计纯净语音的功率谱，当噪声干扰较小时，改用传统的维纳滤波器以减少计算量，最终得到用于识别系统的切换语音功率谱估计算法。实验结果表明，所提算法相比传统的瑞利分布下的MMSE估计器在各种噪声的情况下识别率平均提高在8个百分点左右，在去除噪声干扰、提高识别系统鲁棒性的同时，减小了语音识别系统的功耗。

关键词: 自动语音识别系统, 鲁棒性, 最小均方误差, 语音存在概率, 功率谱估计, 维纳滤波器

Abstract: In order to solve the poor robust problem of Automatic Speech Recognition (ASR) system in noisy environment, a new estimation algorithm of switching speech power spectrum was proposed. Firstly, based on the assumption of the speech spectral amplitude was better modelled for a Chi distribution, a modified estimation algorithm of speech power spectrum based on Minimum Mean Square Error (MMSE) was proposed. Then incorporating the Speech Presence Probability (SPP), a new MMSE estimator based on SPP was obtained. Next, the new approach and the conventional Wiener filter were combined to develop a switch algorithm. With the heavy noise environment, the modified MMSE estimator was used to estimate the clean speech power spectrum; otherwise, the Wiener filter was employed to reduce calculating amount. The final estimation algorithm of switching speech power spectrum for ASR system was obtained. The experimental results show that,compared with the traditional MMSE estimator with Rayleigh prior, the recognition accurate of the proposed algorithm was averagely improved by 8 percentage points in various noise environments. The proposed algorithm can improve the robustness of the ASR system by removing the noise, and reduce the computational cost.

Key words: Automatic Speech Recognition (ASR) system, robustness, Minimum Mean Square Error (MMSE), Speech Presence Probability (SPP), estimation of speech power spectrum, Wiener filter

中图分类号:

TN912.35

刘金刚, 周翊, 马永保, 刘宏清. 用于自动语音识别系统的切换语音功率谱估计算法[J]. 计算机应用, 2016, 36(12): 3369-3373.

LIU Jingang, ZHOU Yi, MA Yongbao, LIU Hongqing. Estimation algorithm of switching speech power spectrum for automatic speech recognition system[J]. Journal of Computer Applications, 2016, 36(12): 3369-3373.

参考文献

[1] VIRTANEN T, SINGH R, RAJ B. Techniques for Noise Robustness in Automatic Speech Recognition[M]. New York:Wiley & Sons, 2012:228-231.
[2] EPHRAIM Y, MALAH D. Speech enhancement using a minimum mean-square error log-spectral amplitude estimator[J]. IEEE Transactions on Acoustics Speech and Signal Processing, 1985, 33(2):443-445.
[3] COHEN I. Optimal speech enhancement under signal presence uncertainty using log-spectral amplitude estimator[J]. IEEE Signal Processing Letters, 2002, 9(4):113-116.
[4] ASTUDILLO R F, ORGLMEISTER R. Computing MMSE estimates and residual uncertainty directly in the feature domain of ASR using STFT domain speech distortion models[J]. IEEE Transactions on Acoustics Speech and Signal Processing, 2013, 21(5):1023-1034.
[5] JENSEN J, TAN Z H. Minimum mean-square error estimation of Mel-frequency cepstral features theoretically consistent approach[J]. IEEE/ACM Transactions on Audio, Speech and Language Processing, 2015, 23(1):186-197.
[6] INDREBO K M, POVINELLI R J, JOHNSON M T. Minimum mean-squared error estimation of Mel-frequency cepstral coefficients using a novel distortion model[J]. IEEE/ACM Transactions on Audio, Speech and Language Processing, 2008, 16(8):1654-1661.
[7] LOIZOU P C. Speech Enhancement:Theory and Practice[M]. Boca Raton, FL:CRC Press, 2007:119-122.
[8] DAT T H, TAKEDA K, ITAKURA F. Generalized Gamma modeling of speech and its online estimation for speech enhancement[C]//Proceedings of the 2005 IEEE International Conference on Acoustics Speech and Signal Processing. Piscataway, NJ:IEEE, 2005, 4:181-184.
[9] LOTTER T, VARY P. Noise reduction by joint maximum a posteriori spectral amplitude and phase estimation with super-Gaussian speech modelling[C]//Proceedings of the 2004 European Conference on Signal Processing. Piscataway, NJ:IEEE, 2004:1457-1460.
[10] ERKELENS J S, HENDRIKS R C, HEUSDENS R, et al. Minimum mean-square error estimation of discrete Fourier coefficients with generalized Gamma priors[J]. IEEE Transactions on Audio, Speech and Language Processing, 2007, 15(6):1741-1752.
[11] GRADSHTEYN I S, RYZHIK I M. Table of Integrals, Series, and Products[M]. 7th ed. Cambridge, Massachusetts:Academic Press, 2007:346-353, 699-711.
[12] STARK A, PALIWAL K. MMSE estimation of log-filterbank energies for robust speech recognition[J]. Speech Communication, 2011, 53(3):403-416.
[13] FODOR B, FINGSCHEIDT T. MMSE speech enhancement under speech presence uncertainty assuming (generalized) Gamma speech priors throughout[C]//Proceedings of the 2012 IEEE International Conference on Acoustics Speech and Signal Processing. Piscataway, NJ:IEEE, 2012:4033-4036.
[14] TRIBOLET J M, NOLL P, MCDERMOTT B, et al. A study of complexity and quality of speech waveform coders[C]//Proceedings of the 1978 IEEE International Conference on Acoustics, Speech, and Signal Processing. Piscataway, NJ:IEEE, 1978, 3:586-590.
[15] RIX A W, BEERENDS J G, HOLLIER M P, et al. Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs[C]//Proceedings of the 2001 IEEE International Conference on Acoustics Speech and Signal Processing. Washing, DC:IEEE Computer Society, 2001, 2:749-752.
[16] Carnegie Mellon University. Carnegie Mellon University sphinx[EB/OL].[2016-04-14]. http://cmusphinx.sourceforge.net/.
[17] VARGA A, STEENEKEN H J M. Assessment for automatic speech recognition:Ⅱ. NOISEX-92:a database and an experiment to study the effect of additive noise on speech recognition systems[J]. Speech Communication, 1993, 12(93):247-251.
[18] BREITHAUPT C, GERKMANN T, MARTIN R. A novel a priori SNR estimation approach based on selective cepstro-temporal smoothing[C]//Proceedings of the 2008 IEEE International Conference on Acoustics Speech and Signal Processing. Piscataway, NJ:IEEE, 2008:4897-4900.
[19] HENDRIKS R C, HEUSDENS R, JENSEN J. MMSE based noise PSD tracking with low complexity[C]//Proceedings of the 2010 IEEE International Conference on Acoustics Speech and Signal Processing. Piscataway, NJ:IEEE, 2010:4266-4269.

用于自动语音识别系统的切换语音功率谱估计算法

Estimation algorithm of switching speech power spectrum for automatic speech recognition system

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	陈学斌, 任志强, 张宏扬. 联邦学习中的安全威胁与防御措施综述[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1663-1672.
[2]	王华华, 张旭, 李峰. 面向高速移动环境的二级信号检测算法[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1236-1241.
[3]	董炜娜, 刘佳, 潘晓中, 陈立峰, 孙文权. 基于编码-解码网络的大容量鲁棒图像隐写方案[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 772-779.
[4]	黄杰, 武瑞梓, 李均利. 高效的自适应复杂网络鲁棒性优化算法[J]. 《计算机应用》唯一官方网站, 2024, 44(11): 3530-3539.
[5]	汪韩, 万源, 王东, 丁义明. 宽度学习系统中鲁棒性权值矩阵组合的筛选方法[J]. 《计算机应用》唯一官方网站, 2024, 44(10): 3032-3038.
[6]	赵徐炎, 崔允贺, 蒋朝惠, 钱清, 申国伟, 郭春, 李显超. CHAIN：基于重合支配的边缘计算节点放置算法[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2812-2818.
[7]	高健, 李智, 樊缤, 姜传贤. 基于光线投射采样和四元数正交矩的高效三维医学影像鲁棒零水印算法[J]. 《计算机应用》唯一官方网站, 2023, 43(4): 1191-1197.
[8]	葛孟婷, 万鸣华. 基于近邻监督局部不变鲁棒主成分分析的特征提取模型[J]. 《计算机应用》唯一官方网站, 2023, 43(4): 1013-1020.
[9]	范贤博俊, 陈立家, 李珅, 王晨露, 王敏, 王赞, 刘名果. 鲁棒的视觉机械臂联合建模优化方法[J]. 《计算机应用》唯一官方网站, 2023, 43(3): 962-971.
[10]	孙梦迪, 孙忠贵, 孔旭, 韩红燕. 针对多模态图像的自适应引导形态学设计[J]. 《计算机应用》唯一官方网站, 2023, 43(2): 560-566.
[11]	刘帅, 蒋林, 李远成, 山蕊, 朱育琳, 王欣. 基于阵列处理器的最小均方误差检测算法并行设计与实现[J]. 《计算机应用》唯一官方网站, 2022, 42(5): 1524-1530.
[12]	柏财通, 崔翛龙, 郑会吉, 李爱. 基于自监督知识迁移的鲁棒性语音识别技术[J]. 《计算机应用》唯一官方网站, 2022, 42(10): 3217-3223.
[13]	高工, 杨红雨, 刘洪. 基于深度学习的三维点云人脸识别[J]. 计算机应用, 2021, 41(9): 2736-2740.
[14]	李华, 卢桂馥, 余沁茹. 基于干净数据的流形正则化非负矩阵分解[J]. 《计算机应用》唯一官方网站, 2021, 41(12): 3492-3498.
[15]	龚云鹏, 曾智勇, 叶锋. 基于灰度域特征增强的行人重识别方法[J]. 《计算机应用》唯一官方网站, 2021, 41(12): 3590-3595.