基于双微阵列与卷积神经网络的语音识别方法

doi:10.11772/j.issn.1001-9081.2019050878

计算机应用 ›› 2019, Vol. 39 ›› Issue (11): 3268-3273.DOI: 10.11772/j.issn.1001-9081.2019050878

基于双微阵列与卷积神经网络的语音识别方法

刘伟波, 曾庆宁, 卜玉婷, 郑展恒

桂林电子科技大学信息与通信学院, 广西桂林 541004

收稿日期:2019-05-23 修回日期:2019-08-09 发布日期:2019-09-16 出版日期:2019-11-10
通讯作者: 曾庆宁
作者简介:刘伟波(1991-),男,河南商丘人,硕士研究生,主要研究方向:语音识别;曾庆宁(1963-),男,广西桂林人,教授,博士,主要研究方向:语音信号处理、图像处理;卜玉婷(1995-),女,湖南益阳人,硕士研究生,主要研究方向:语音信号处理;郑展恒(1978-),男,陕西杨凌人,高级实验师,硕士,主要研究方向:语音信号处理。
基金资助:
国家自然科学基金资助项目（61461011）；广西自然科学基金重点项目（2016GXNSFDA380018）；"认知无线电与信息处理"教育部重点实验室主任基金资助项目（CRKL160107，CRKL170108）。

Speech recognition method based on dual micro-array and convolutional neural network

LIU Weibo, ZENG Qingning, BU Yuting, ZHENG Zhanheng

School of Information and Communication, Guilin University of Electronic Technology, Guilin Guangxi 541004, China

Received:2019-05-23 Revised:2019-08-09 Online:2019-09-16 Published:2019-11-10
Supported by:
This work is partially supported by the National Natural Science Foundation of China (61461011), the State Key Program of National Nature Science of Guangxi Zhuang Autonomous Region (2016GXNSFDA380018), the Director Fund of Key Laboratory of Cognitive Radio and Information Processing of Ministry of Education (CRKL160107, CRKL170108).

摘要/Abstract

摘要： 为解决噪声环境下语音识别率降低以及传统波束形成算法难以处理空间噪声的问题，基于双微阵列结构提出了一种改进的最小方差无畸变响应（MVDR）波束形成方法。首先，采用对角加载提高双微阵列增益，并利用递归矩阵求逆降低计算复杂度；然后，通过后置调制域谱减法对语音作进一步处理，解决了一般谱减法容易产生音乐噪声的问题，有效减小了语音畸变，获得了良好的噪声抑制效果；最后，采用卷积神经网络（CNN）进行语音模型的训练，提取语音深层次的特征，有效地解决了语音信号多样性问题。实验结果表明，提出的方法在经CNN训练的语音识别系统模型中取得了较好的识别效果，在信噪比为10 dB的F16噪声环境下的语音识别率达到了92.3%，具有良好的稳健性。

关键词: 语音识别, 双微阵列, 卷积神经网络, 噪声环境, 稳健性

Abstract: In order to solve the low speech recognition rate in noise environment, and the difficulty of traditional beamforming algorithm in dealing with spatial noise problem, an improved Minimum Variance Distortionless Response (MVDR) beamforming method based on dual micro-array was proposed. Firstly, the gain of micro-array was increased by diagonal loading, and the computational complexity was reduced by the inversion of recursive matrix. Then, through the modulation domain spectrum subtraction for further processing, the problem that music noise was easily produced by general spectral subtraction was solved, effectively reducing speech distortion, and well suppressing the noise. Finally, the Convolution Neural Network (CNN) was used to train the speech model and extract the deep features of speech, effectively solve the problem of speech signal diversity. The experimental results show that the proposed method achieves good recognition effect in the CNN trained speech recognition system, and has the speech recognition accuracy of 92.3% in F16 noise environment with 10 dB signal-to-noise ratio, means it has good robustness.

Key words: speech recognition, dual micro-array, Convolutional Neural Network (CNN), noise environment, robustness

中图分类号:

TN912.34

刘伟波, 曾庆宁, 卜玉婷, 郑展恒. 基于双微阵列与卷积神经网络的语音识别方法[J]. 计算机应用, 2019, 39(11): 3268-3273.

LIU Weibo, ZENG Qingning, BU Yuting, ZHENG Zhanheng. Speech recognition method based on dual micro-array and convolutional neural network[J]. Journal of Computer Applications, 2019, 39(11): 3268-3273.

参考文献

[1] 韩纪庆, 张磊, 郑铁然. 语音信号处理[M]. 北京:清华大学出版社,2004:1-4.(HAN J Q, ZHANG L, ZHENG T R. Speech Signal Processing[M].Beijing:Tsinghua University Press,2004:1-4.)
[2] 宋知用. Matlab在语音信号分析与合成中的应用[M].北京:北京航空航天大学出版社, 2013:176-199.(SONG Z Y. Application of Matlab in Speech Signal Analysis and Synthesis[M]. Beijing:Beihang University Press, 2013:176-199.)
[3] ZHANG X, WANG Z, WANG D. A speech enhancement algorithm by iterating single-and multi-microphone processing and its application to robust ASR[C]//Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway:IEEE, 2017:276-280.
[4] HIGUCHI T, ITO N, ARAKI S, et al. Online MVDR beamformer based on complex Gaussian mixture model with spatial prior for noise robust ASR[J]. IEEE/ACM Transactions on Audio Speech and Language Processing, 2017, 25(4):780-793.
[5] PFEIFENBERGER L, SCHRANK T, ZÖHRER M, et al. Multi-channel speech processing architectures for noise robust speech recognition:3rd CHiME challenge results[C]//Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding. Piscataway:IEEE, 2016:1-7.
[6] 曾庆宁,卜玉婷,刘伟波.一种适用于噪声环境下的语音识别方法:201910581762.8[P].2019-06-30.(ZENG Q N,BU Y T, LIU W B. A speech recognition method suitable for noise environments:201910581762.8[P].2019-06-30.)
[7] TASESKA M, HABETS E A P. Informed spatial filtering for sound extraction using distributed microphone arrays[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2014,22(7):1195-1207.
[8] 曾庆宁, 肖强, 王瑶,等.一种双微阵列语音增强方法[J].电子与信息学报,2018, 40(5):1187-1194.(ZENG Q N,XIAO Q, WANG Y, et al. A dual micro-array speech enhancement method[J]. Journal of Electronics & Information Technology, 2018, 40(5):1187-1194.)
[9] CAPON J, GREENFIELD R J, KOLKER R J. Multidimensional maximum-likelihood processing of a large aperture seismic array[J]. Proceedings of the IEEE, 1967, 55(2):192-211.
[10] 施荣华, 孟秋杰, 董健,等. 一种基于对角载入的鲁棒MVDR波束形成算法[J]. 湖南大学学报(自然科学版), 2012, 39(9):57-61. (SHI R H,MENG Q J,DONG J, et al. A robust adaptive beamforming algorithm based on diagonal loading[J].Journal of Hunan University (Natural Sciences), 2012, 39(9):57-61.)
[11] MITRA V, van HOUT J, WANG W, et al. Improving robustness against reverberation for automatic speech recognition[C]//Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding. Piscataway:IEEE, 2015:525-532.
[12] MITRA V, WANG W, BARTELS C, et al. Articulatory information and multiview features for large vocabulary continuous speech recognition[C]//Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway:IEEE, 2018:5634-5638.
[13] 陈紫强,李欣阳,谢跃雷. 结合相位谱补偿的调制域谱减法[J]. 信号处理,2015, 31(4):468-473.(CHEN Z Q, LI X Y, XIE Y L. Modulation domain spectral subtraction combined with phase spectrum compensation[J].Journal of Signal Processing, 2015, 31(4):468-473.)
[14] QIAN Y, TAN T,YU D. Neural network based multi-factor aware joint training for robust speech recognition[J].IEEE/ACM Transactions on Audio Speech & Language Processing, 2017,24(12):2231-2240.
[15] 张晴晴, 刘勇, 潘接林,等. 基于卷积神经网络的连续语音识别[J]. 工程科学学报, 2015, 37(9):1212-1217.(ZHANG Q Q, LIU Y, PAN J L, et al. Continuous speech recognition based on convolutional neural networks[J]. Chinese Journal of Engineering, 2015, 37(9):1212-1217.)
[16] 周志华. 机器学习[M].北京:清华大学出版社,2016:97-140.(ZHOU Z H. Machine Learning[M]. Beijing:Tsinghua University Press, 2016:97-140.)
[17] CHAN W, LANE I. Deep convolutional neural networks for acoustic modeling in low resource languages[C]//Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway:IEEE, 2015:2056-2060.
[18] 马金龙,曾庆宁, 龙超,等. 多噪声环境下可懂度提升的助听器语音增强[J].计算机工程与设计, 2016, 37(8):2160-2164.(MA J L, ZENG Q N, LONG C, et al. Intelligibility improved speech enhancement for hearing aids in complex noise environment[J].Computer Engineering and Design, 2016, 37(8):2160-2164.)

基于双微阵列与卷积神经网络的语音识别方法

Speech recognition method based on dual micro-array and convolutional neural network

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	秦璟, 秦志光, 李发礼, 彭悦恒. 基于概率稀疏自注意力神经网络的重性抑郁疾患诊断[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2970-2974.
[2]	李云, 王富铕, 井佩光, 王粟, 肖澳. 基于不确定度感知的帧关联短视频事件检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2903-2910.
[3]	陈虹, 齐兵, 金海波, 武聪, 张立昂. 融合1D-CNN与BiGRU的类不平衡流量异常检测[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2493-2499.
[4]	赵宇博, 张丽萍, 闫盛, 侯敏, 高茂. 基于改进分段卷积神经网络和知识蒸馏的学科知识实体间关系抽取[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2421-2429.
[5]	张春雪, 仇丽青, 孙承爱, 荆彩霞. 基于两阶段动态兴趣识别的购买行为预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2365-2371.
[6]	王东炜, 刘柏辰, 韩志, 王艳美, 唐延东. 基于低秩分解和向量量化的深度网络压缩方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 1987-1994.
[7]	高阳峄, 雷涛, 杜晓刚, 李岁永, 王营博, 闵重丹. 基于像素距离图和四维动态卷积网络的密集人群计数与定位方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2233-2242.
[8]	姚迅, 秦忠正, 杨捷. 生成式标签对抗的文本分类模型[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1781-1785.
[9]	沈君凤, 周星辰, 汤灿. 基于改进的提示学习方法的双通道情感分析模型[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1796-1806.
[10]	黄梦源, 常侃, 凌铭阳, 韦新杰, 覃团发. 基于层间引导的低光照图像渐进增强算法[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1911-1919.
[11]	李健京, 李贯峰, 秦飞舟, 李卫军. 基于不确定知识图谱嵌入的多关系近似推理模型[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1751-1759.
[12]	高文烁, 陈晓云. 基于节点结构的点云分类网络[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1471-1478.
[13]	孙敏, 成倩, 丁希宁. 基于CBAM-CGRU-SVM的Android恶意软件检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1539-1545.
[14]	席治远, 唐超, 童安炀, 王文剑. 基于双路时空网络的驾驶员行为识别[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1511-1519.
[15]	王杰, 孟华. 基于点云整体拓扑结构的图像分类算法[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1107-1113.