Speaker recognition in strong noise environment based on auditory cortical neuronal receptive field

doi:10.11772/j.issn.1001-9081.2020020272

Journal of Computer Applications ›› 2020, Vol. 40 ›› Issue (10): 3034-3040.DOI: 10.11772/j.issn.1001-9081.2020020272

• Virtual reality and multimedia computing • Previous Articles Next Articles

Speaker recognition in strong noise environment based on auditory cortical neuronal receptive field

NIU Xiaoke^1,2, HUANG Yixin¹, XU Huaxing^1,2, JIANG Zhenyang¹

1. School of Electrical Engineering, Zhengzhou University, Zhengzhou Henan 450001, China;
2. Henan Key Laboratory of Brain Science and Brain-Computer Interface Technology(Zhengzhou University), Zhengzhou Henan 450001, China

Received:2020-04-12 Revised:2020-06-01 Online:2020-06-24 Published:2020-10-10
Supported by:
This work is partially supported by the National Natural Science Foundation of China (11804309).

基于听皮层神经元感受野的强噪声环境下说话人识别

牛晓可^1,2, 黄伊鑫¹, 徐华兴^1,2, 蒋震阳¹

1. 郑州大学电气工程学院, 郑州 450001;
2. 河南省脑科学与脑机接口技术重点实验室(郑州大学), 郑州 450001

通讯作者: 牛晓可
作者简介:牛晓可(1987-),女,河南平顶山人,讲师,博士,主要研究方向:生物信息建模、生物信号处理;黄伊鑫(1994-),男,河南许昌人,硕士研究生,主要研究方向:基于人耳听觉感知特性的鲁棒说话人识别;徐华兴(1988-),男,河南驻马店人,讲师,博士,主要研究方向:声音信号处理;蒋震阳(1997-),男,河南洛阳人,硕士研究生,主要研究方向:生物信息建模。
基金资助:
国家自然科学基金资助项目（11804309）。

Abstract

Abstract: Aiming at the problem that speaker recognition is susceptible to environmental noise, a new voiceprint extraction method was proposed based on the spatial-temporal filtering mechanism of Spectra-Temporal Receptive Field (STRF) of biological auditory cortex neurons. In the method, the quadratic characteristics were extracted from the auditory scale-rate map based on STRF, and the traditional Mel-Frequency Cepstral Coefficient (MFCC) was combined to obtain the voiceprint features with strong tolerance to environmental noise. Using Support Vector Machine (SVM) as feature classifier, the testing results on speech data with different Signal-to-Noise Ratios (SNR) showed that the STRF-based features were more robust to noise than MFCC coefficient, but had lower recognition accuracy; the combined features improved the accuracy of speech recognition and had good robustness to noise. The results verify the effectiveness of the proposed method in speaker recognition under strong noise environment.

Key words: auditory cortex, Spectral-Temporal Receptive Field (STRF), Mel-Frequency Ceptral Coefficient (MFCC), noisy speaker recognition, Support Vector Machine (SVM)

摘要： 针对说话人识别易受环境噪声影响的问题，借鉴生物听皮层神经元频谱-时间感受野（STRF）的时空滤波机制，提出一种新的声纹特征提取方法。在该方法中，对基于STRF获得的听觉尺度-速率图进行了二次特征提取，并与传统梅尔倒谱系数（MFCC）进行组合，获得了对环境噪声具有强容忍的声纹特征。采用支持向量机（SVM）作为分类器，对不同信噪比（SNR）语音数据进行测试的结果表明，基于STRF的特征对噪声的鲁棒性普遍高于MFCC系数，但识别正确率较低；组合特征提升了语音识别的正确率，同时对环境噪声具有良好的鲁棒性。该结果说明所提方法在强噪声环境下说话人识别上是有效的。

关键词: 听皮层, 频谱-时间感受野, 梅尔倒谱系数, 含噪说话人识别, 支持向量机

CLC Number:

TP391.4

NIU Xiaoke, HUANG Yixin, XU Huaxing, JIANG Zhenyang. Speaker recognition in strong noise environment based on auditory cortical neuronal receptive field[J]. Journal of Computer Applications, 2020, 40(10): 3034-3040.

牛晓可, 黄伊鑫, 徐华兴, 蒋震阳. 基于听皮层神经元感受野的强噪声环境下说话人识别[J]. 计算机应用, 2020, 40(10): 3034-3040.

References

[1] 李湾湾. 说话人声纹识别的算法研究[D]. 杭州:浙江大学, 2017:3.(LI W W. Research on algorithms for speaker recognition[D]. Hangzhou:Zhejiang University,2017:3.)
[2] ZHENG T F, LI L. Speaker recognition:introduction[M]//Robustness Related Issues in Speaker Recognition. Singapore:Springer,2017:1-14.
[3] 王凯龙. 基于计算听觉场景分析的多人语音分离方法[D]. 南京:南京理工大学,2017:5.(WANG K L. Multi-person speech separation method based on computational auditory scene analysis[D]. Nanjing:Nanjing University of Science and Technology, 2017:5.)
[4] 张靖, 俞一彪. 具有环境自学习机制的鲁棒说话人识别算法[J]. 通信技术,2020,53(3):618-624.(ZHANG J,YU Y B. Robust speaker recognition algorithm with environment self-learning mechanism[J]. Communications Technology, 2020, 53(3):618-624.)
[5] 顾婷. 基于深度特征的说话人辨认技术研究[D]. 南京:南京邮电大学,2019:43-46.(GU T. Research of speaker identification technology based on deep features[D]. Nanjing:Nanjing University of Posts and Telecommunications,2019:43-46.)
[6] 赵飞. 基于深度神经网络的鲁棒性说话人确认方法研究[D]. 呼和浩特:内蒙古大学,2019:11.(ZHAO F. Research on robust speaker confirmation method based on deep neural network[D]. Hohhot:Inner Mongolia University,2019:11.)
[7] CHI T,RU P,SHAMMA S A. Multiresolution spectrotemporal analysis of complex sounds[J]. The Journal of the Acoustical Society of America,2005,118(2):887-906.
[8] PATIL K,PRESSNITZER D,SHAMMA S,et al. Music in our ears:the biological bases of musical timbre perception[J]. PLoS Computational Biology,2012,8(11):No. e1002759.
[9] CARLIN M A,ELHILALI M. Modeling attention-driven plasticity in auditory cortical receptive fields[J]. Frontiers in Computational Neuroscience,2015,9:No. 106.
[10] CARLIN M A,ELHILALI M. A framework for speech activity detection using adaptive auditory receptive fields[J]. IEEE/ACM Transactions on Audio,Speech,and Language Processing,2015, 23(12):2422-2433.
[11] EMMANOUILIDOU D,MCCOLLUM E D,PARK D E,et al. Computerized lung sound screening for pediatric auscultation in noisy field environments[J]. IEEE Transactions on Bio-Medical Engineering,2018,65(7):1564-1574.
[12] DODDINGTON G R. Speaker recognition-identifying people by their voices[J]. Proceedings of the IEEE,1985,73(11):1651-1664.
[13] SHIKANO K. Text-independent speaker recognition experiments using codebooks in vector quantization[J]. Journal of the Acoustical Society of America,77(S1):S11-S11.
[14] WAIBEL A. Modular Construction of time-delay neural networks for speech recognition[J]. Neural Computation,1989,1(1):39-46.
[15] REYNOLDS D A,ROSE R C. Robust text-independent speaker identification using Gaussian mixture speaker models[J]. IEEE Transactions on Speech and Audio Processing,1995,3(1):72-83.
[16] REYNOLDS D A, QUATIERI T F, DUNN R B. Speaker verification using adapted Gaussian mixture models[J]. Digital Signal Processing,2000,10(1/2/3):19-41.
[17] CAMPBELL W M,STURIM D E,REYNOLDS D A,et al. SVM based speaker verification using a GMM supervector kernel and NAP variability compensation[C]//Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing. Piscataway:IEEE,2006:I-I.
[18] KENNY P, OUELLET P, DEHAK N, et al. A study of interspeaker variability in speaker verification[J]. IEEE Transactions on Audio,Speech,and Language Processing,2008, 16(5):980-988.
[19] DEHAK N,KENNY P J,DEHAK R,et al. Front-end factor analysis for speaker verification[J]. IEEE Transactions on Audio, Speech,and Language Processing,2011,19(4):788-798.
[20] 刘凤增, 李国辉, 李博. OM-LSA和小波阈值去噪结合的语音增强[J]. 计算机科学与探索,2011,5(6):547-552.(LIU F Z,LI G H,LI B. Speech enhancement with OM-LSA Incorporating wavelet thresholding[J]. Journal of Frontiers of Computer Science and Technology,2011,5(6):547-552.)
[21] 张建伟, 陶亮, 周健, 等. 基于改进谱平滑策略的IMCRA算法及其语音增强[J]. 计算机工程与应用,2017,53(1):153-157. (ZHANG J W, TAO L, ZHOU J, et al. Improved minima controlled recursive averaging algorithm based on improved spectrum smoothing strategy and speech enhancement[J]. Computer Engineering and Applications, 2017, 53(1):153-157.)
[22] 周于皓, 张红玲, 李芳菲, 等. 局部关注支持向量机算法[J]. 计算机应用,2018,38(4):945-948.(ZHOU Y H,ZHANG H L, LI F F,et al. Local attention support vector machine algorithm[J]. Computer applications,2018,38(4):945-948)
[23] 崔锐. 噪声环境下鲁棒性说话人识别算法研究[D]. 西安:西安电子科技大学,2017:34. (CUI R. Research on robust speaker recognition algorithm in noisy environment[D]. Xi'an:Xidian University,2017:34)

Speaker recognition in strong noise environment based on auditory cortical neuronal receptive field

基于听皮层神经元感受野的强噪声环境下说话人识别

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

[1]	Min SUN, Qian CHENG, Xining DING. CBAM-CGRU-SVM based malware detection method for Android [J]. Journal of Computer Applications, 2024, 44(5): 1539-1545.
[2]	Enbao QIAO, Xiangyang GAO, Jun CHENG. Self-recovery adaptive Monte Carlo localization algorithm based on support vector machine [J]. Journal of Computer Applications, 2024, 44(10): 3246-3251.
[3]	Xueyu HUANG, Huaiyu HE, Huimin LIN, Jinshui CHEN. Classification and recognition method of copper alloy metallograph based on feature aggregation [J]. Journal of Computer Applications, 2023, 43(8): 2593-2601.
[4]	Lei YANG, Hongdong ZHAO, Kuaikuai YU. End-to-end speech emotion recognition based on multi-head attention [J]. Journal of Computer Applications, 2022, 42(6): 1869-1875.
[5]	Zhen QU, Kunting LI, Zhixi FENG. Remote sensing image scene classification based on effective channel attention [J]. Journal of Computer Applications, 2022, 42(5): 1431-1439.
[6]	Guifang QIAO, Shouming HOU, Yanyan LIU. Facial expression recognition algorithm based on combination of improved convolutional neural network and support vector machine [J]. Journal of Computer Applications, 2022, 42(4): 1253-1259.
[7]	Wang TAN, Yi LI. Synthesis of loop bound functions for loop programs [J]. Journal of Computer Applications, 2022, 42(2): 565-573.
[8]	Qian GE, Guangbin ZHANG, Xiaofeng ZHANG. Automatic feature selection algorithm based on interaction of ReliefF with maximum information coefficient and SVM [J]. Journal of Computer Applications, 2022, 42(10): 3046-3053.
[9]	Hongfei JIA, Xi LIU, Yu WANG, Hongbing XIAO, Suxia XING. Application of 3DPCANet in image classification of functional magnetic resonance imaging for Alzheimer’s disease [J]. Journal of Computer Applications, 2022, 42(1): 310-315.
[10]	JIA Heming, JIANG Zichao, LI Yao, SUN Kangjian. Simultaneous feature selection optimization based on improved spotted hyena optimizer algorithm [J]. Journal of Computer Applications, 2021, 41(5): 1290-1298.
[11]	YUAN Qianqian, DENG Hongmin, WANG Xiaohang. Citrus disease and insect pest area segmentation based on superpixel fast fuzzy C-means clustering and support vector machine [J]. Journal of Computer Applications, 2021, 41(2): 563-570.
[12]	Hongliang CAO, Ying ZHANG, Bin WU, Fanyu LI, Xubo NA. Prediction method of liver transplantation complications based on transfer component analysis and support vector machine [J]. Journal of Computer Applications, 2021, 41(12): 3608-3613.
[13]	Kai LI, Jie LI. Structure-fuzzy multi-class support vector machine algorithm based on pinball loss [J]. Journal of Computer Applications, 2021, 41(11): 3104-3112.
[14]	TONG Lin, GUAN Zheng. Fuzzy granulation prediction of traffic flow based on improved whale optimization support vector machine [J]. Journal of Computer Applications, 2021, 41(10): 2919-2927.
[15]	ZHANG Jianming, SHI Yuanhao, XU Zhengyi, WEI Jianming. Adaptive UWB/PDR fusion positioning algorithm based on error prediction [J]. Journal of Computer Applications, 2020, 40(6): 1755-1762.