Journal of Computer Applications ›› 2020, Vol. 40 ›› Issue (10): 3034-3040.DOI: 10.11772/j.issn.1001-9081.2020020272

• Virtual reality and multimedia computing • Previous Articles     Next Articles

Speaker recognition in strong noise environment based on auditory cortical neuronal receptive field

NIU Xiaoke1,2, HUANG Yixin1, XU Huaxing1,2, JIANG Zhenyang1   

  1. 1. School of Electrical Engineering, Zhengzhou University, Zhengzhou Henan 450001, China;
    2. Henan Key Laboratory of Brain Science and Brain-Computer Interface Technology(Zhengzhou University), Zhengzhou Henan 450001, China
  • Received:2020-04-12 Revised:2020-06-01 Online:2020-10-10 Published:2020-06-24
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (11804309).


牛晓可1,2, 黄伊鑫1, 徐华兴1,2, 蒋震阳1   

  1. 1. 郑州大学 电气工程学院, 郑州 450001;
    2. 河南省脑科学与脑机接口技术重点实验室(郑州大学), 郑州 450001
  • 通讯作者: 牛晓可
  • 作者简介:牛晓可(1987-),女,河南平顶山人,讲师,博士,主要研究方向:生物信息建模、生物信号处理;黄伊鑫(1994-),男,河南许昌人,硕士研究生,主要研究方向:基于人耳听觉感知特性的鲁棒说话人识别;徐华兴(1988-),男,河南驻马店人,讲师,博士,主要研究方向:声音信号处理;蒋震阳(1997-),男,河南洛阳人,硕士研究生,主要研究方向:生物信息建模。
  • 基金资助:

Abstract: Aiming at the problem that speaker recognition is susceptible to environmental noise, a new voiceprint extraction method was proposed based on the spatial-temporal filtering mechanism of Spectra-Temporal Receptive Field (STRF) of biological auditory cortex neurons. In the method, the quadratic characteristics were extracted from the auditory scale-rate map based on STRF, and the traditional Mel-Frequency Cepstral Coefficient (MFCC) was combined to obtain the voiceprint features with strong tolerance to environmental noise. Using Support Vector Machine (SVM) as feature classifier, the testing results on speech data with different Signal-to-Noise Ratios (SNR) showed that the STRF-based features were more robust to noise than MFCC coefficient, but had lower recognition accuracy; the combined features improved the accuracy of speech recognition and had good robustness to noise. The results verify the effectiveness of the proposed method in speaker recognition under strong noise environment.

Key words: auditory cortex, Spectral-Temporal Receptive Field (STRF), Mel-Frequency Ceptral Coefficient (MFCC), noisy speaker recognition, Support Vector Machine (SVM)

摘要: 针对说话人识别易受环境噪声影响的问题,借鉴生物听皮层神经元频谱-时间感受野(STRF)的时空滤波机制,提出一种新的声纹特征提取方法。在该方法中,对基于STRF获得的听觉尺度-速率图进行了二次特征提取,并与传统梅尔倒谱系数(MFCC)进行组合,获得了对环境噪声具有强容忍的声纹特征。采用支持向量机(SVM)作为分类器,对不同信噪比(SNR)语音数据进行测试的结果表明,基于STRF的特征对噪声的鲁棒性普遍高于MFCC系数,但识别正确率较低;组合特征提升了语音识别的正确率,同时对环境噪声具有良好的鲁棒性。该结果说明所提方法在强噪声环境下说话人识别上是有效的。

关键词: 听皮层, 频谱-时间感受野, 梅尔倒谱系数, 含噪说话人识别, 支持向量机

CLC Number: