Abstract:In order to solve the low speech recognition rate in noise environment, and the difficulty of traditional beamforming algorithm in dealing with spatial noise problem, an improved Minimum Variance Distortionless Response (MVDR) beamforming method based on dual micro-array was proposed. Firstly, the gain of micro-array was increased by diagonal loading, and the computational complexity was reduced by the inversion of recursive matrix. Then, through the modulation domain spectrum subtraction for further processing, the problem that music noise was easily produced by general spectral subtraction was solved, effectively reducing speech distortion, and well suppressing the noise. Finally, the Convolution Neural Network (CNN) was used to train the speech model and extract the deep features of speech, effectively solve the problem of speech signal diversity. The experimental results show that the proposed method achieves good recognition effect in the CNN trained speech recognition system, and has the speech recognition accuracy of 92.3% in F16 noise environment with 10 dB signal-to-noise ratio, means it has good robustness.
[1] 韩纪庆, 张磊, 郑铁然. 语音信号处理[M]. 北京:清华大学出版社,2004:1-4.(HAN J Q, ZHANG L, ZHENG T R. Speech Signal Processing[M].Beijing:Tsinghua University Press,2004:1-4.) [2] 宋知用. Matlab在语音信号分析与合成中的应用[M].北京:北京航空航天大学出版社, 2013:176-199.(SONG Z Y. Application of Matlab in Speech Signal Analysis and Synthesis[M]. Beijing:Beihang University Press, 2013:176-199.) [3] ZHANG X, WANG Z, WANG D. A speech enhancement algorithm by iterating single-and multi-microphone processing and its application to robust ASR[C]//Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway:IEEE, 2017:276-280. [4] HIGUCHI T, ITO N, ARAKI S, et al. Online MVDR beamformer based on complex Gaussian mixture model with spatial prior for noise robust ASR[J]. IEEE/ACM Transactions on Audio Speech and Language Processing, 2017, 25(4):780-793. [5] PFEIFENBERGER L, SCHRANK T, ZÖHRER M, et al. Multi-channel speech processing architectures for noise robust speech recognition:3rd CHiME challenge results[C]//Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding. Piscataway:IEEE, 2016:1-7. [6] 曾庆宁,卜玉婷,刘伟波.一种适用于噪声环境下的语音识别方法:201910581762.8[P].2019-06-30.(ZENG Q N,BU Y T, LIU W B. A speech recognition method suitable for noise environments:201910581762.8[P].2019-06-30.) [7] TASESKA M, HABETS E A P. Informed spatial filtering for sound extraction using distributed microphone arrays[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2014,22(7):1195-1207. [8] 曾庆宁, 肖强, 王瑶,等.一种双微阵列语音增强方法[J].电子与信息学报,2018, 40(5):1187-1194.(ZENG Q N,XIAO Q, WANG Y, et al. A dual micro-array speech enhancement method[J]. Journal of Electronics & Information Technology, 2018, 40(5):1187-1194.) [9] CAPON J, GREENFIELD R J, KOLKER R J. Multidimensional maximum-likelihood processing of a large aperture seismic array[J]. Proceedings of the IEEE, 1967, 55(2):192-211. [10] 施荣华, 孟秋杰, 董健,等. 一种基于对角载入的鲁棒MVDR波束形成算法[J]. 湖南大学学报(自然科学版), 2012, 39(9):57-61. (SHI R H,MENG Q J,DONG J, et al. A robust adaptive beamforming algorithm based on diagonal loading[J].Journal of Hunan University (Natural Sciences), 2012, 39(9):57-61.) [11] MITRA V, van HOUT J, WANG W, et al. Improving robustness against reverberation for automatic speech recognition[C]//Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding. Piscataway:IEEE, 2015:525-532. [12] MITRA V, WANG W, BARTELS C, et al. Articulatory information and multiview features for large vocabulary continuous speech recognition[C]//Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway:IEEE, 2018:5634-5638. [13] 陈紫强,李欣阳,谢跃雷. 结合相位谱补偿的调制域谱减法[J]. 信号处理,2015, 31(4):468-473.(CHEN Z Q, LI X Y, XIE Y L. Modulation domain spectral subtraction combined with phase spectrum compensation[J].Journal of Signal Processing, 2015, 31(4):468-473.) [14] QIAN Y, TAN T,YU D. Neural network based multi-factor aware joint training for robust speech recognition[J].IEEE/ACM Transactions on Audio Speech & Language Processing, 2017,24(12):2231-2240. [15] 张晴晴, 刘勇, 潘接林,等. 基于卷积神经网络的连续语音识别[J]. 工程科学学报, 2015, 37(9):1212-1217.(ZHANG Q Q, LIU Y, PAN J L, et al. Continuous speech recognition based on convolutional neural networks[J]. Chinese Journal of Engineering, 2015, 37(9):1212-1217.) [16] 周志华. 机器学习[M].北京:清华大学出版社,2016:97-140.(ZHOU Z H. Machine Learning[M]. Beijing:Tsinghua University Press, 2016:97-140.) [17] CHAN W, LANE I. Deep convolutional neural networks for acoustic modeling in low resource languages[C]//Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway:IEEE, 2015:2056-2060. [18] 马金龙,曾庆宁, 龙超,等. 多噪声环境下可懂度提升的助听器语音增强[J].计算机工程与设计, 2016, 37(8):2160-2164.(MA J L, ZENG Q N, LONG C, et al. Intelligibility improved speech enhancement for hearing aids in complex noise environment[J].Computer Engineering and Design, 2016, 37(8):2160-2164.)