计算机应用 ›› 2019, Vol. 39 ›› Issue (11): 3268-3273.DOI: 10.11772/j.issn.1001-9081.2019050878

• 人工智能 • 上一篇    下一篇

基于双微阵列与卷积神经网络的语音识别方法

刘伟波, 曾庆宁, 卜玉婷, 郑展恒   

  1. 桂林电子科技大学 信息与通信学院, 广西 桂林 541004
  • 收稿日期:2019-05-23 修回日期:2019-08-09 出版日期:2019-11-10 发布日期:2019-09-16
  • 通讯作者: 曾庆宁
  • 作者简介:刘伟波(1991-),男,河南商丘人,硕士研究生,主要研究方向:语音识别;曾庆宁(1963-),男,广西桂林人,教授,博士,主要研究方向:语音信号处理、图像处理;卜玉婷(1995-),女,湖南益阳人,硕士研究生,主要研究方向:语音信号处理;郑展恒(1978-),男,陕西杨凌人,高级实验师,硕士,主要研究方向:语音信号处理。
  • 基金资助:
    国家自然科学基金资助项目(61461011);广西自然科学基金重点项目(2016GXNSFDA380018);"认知无线电与信息处理"教育部重点实验室主任基金资助项目(CRKL160107,CRKL170108)。

Speech recognition method based on dual micro-array and convolutional neural network

LIU Weibo, ZENG Qingning, BU Yuting, ZHENG Zhanheng   

  1. School of Information and Communication, Guilin University of Electronic Technology, Guilin Guangxi 541004, China
  • Received:2019-05-23 Revised:2019-08-09 Online:2019-11-10 Published:2019-09-16
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61461011), the State Key Program of National Nature Science of Guangxi Zhuang Autonomous Region (2016GXNSFDA380018), the Director Fund of Key Laboratory of Cognitive Radio and Information Processing of Ministry of Education (CRKL160107, CRKL170108).

摘要: 为解决噪声环境下语音识别率降低以及传统波束形成算法难以处理空间噪声的问题,基于双微阵列结构提出了一种改进的最小方差无畸变响应(MVDR)波束形成方法。首先,采用对角加载提高双微阵列增益,并利用递归矩阵求逆降低计算复杂度;然后,通过后置调制域谱减法对语音作进一步处理,解决了一般谱减法容易产生音乐噪声的问题,有效减小了语音畸变,获得了良好的噪声抑制效果;最后,采用卷积神经网络(CNN)进行语音模型的训练,提取语音深层次的特征,有效地解决了语音信号多样性问题。实验结果表明,提出的方法在经CNN训练的语音识别系统模型中取得了较好的识别效果,在信噪比为10 dB的F16噪声环境下的语音识别率达到了92.3%,具有良好的稳健性。

关键词: 语音识别, 双微阵列, 卷积神经网络, 噪声环境, 稳健性

Abstract: In order to solve the low speech recognition rate in noise environment, and the difficulty of traditional beamforming algorithm in dealing with spatial noise problem, an improved Minimum Variance Distortionless Response (MVDR) beamforming method based on dual micro-array was proposed. Firstly, the gain of micro-array was increased by diagonal loading, and the computational complexity was reduced by the inversion of recursive matrix. Then, through the modulation domain spectrum subtraction for further processing, the problem that music noise was easily produced by general spectral subtraction was solved, effectively reducing speech distortion, and well suppressing the noise. Finally, the Convolution Neural Network (CNN) was used to train the speech model and extract the deep features of speech, effectively solve the problem of speech signal diversity. The experimental results show that the proposed method achieves good recognition effect in the CNN trained speech recognition system, and has the speech recognition accuracy of 92.3% in F16 noise environment with 10 dB signal-to-noise ratio, means it has good robustness.

Key words: speech recognition, dual micro-array, Convolutional Neural Network (CNN), noise environment, robustness

中图分类号: