基于梅尔频率倒谱系数与翻转梅尔频率倒谱系数的说话人识别方法

doi:10.3724/SP.J.1087.2012.02542

计算机应用 ›› 2012, Vol. 32 ›› Issue (09): 2542-2544.DOI: 10.3724/SP.J.1087.2012.02542

基于梅尔频率倒谱系数与翻转梅尔频率倒谱系数的说话人识别方法

胡峰松,张璇^*

湖南大学信息科学与工程学院,长沙 410082

收稿日期:2012-03-13 修回日期:2012-06-04 发布日期:2012-09-01 出版日期:2012-09-01
通讯作者: 张璇
作者简介:胡峰松(1969-),男,湖南长沙人,副教授,博士,主要研究方向:数字图像处理、说话人识别; 张璇(1988-),女,湖南张家界人,硕士研究生,主要研究方向:说话人识别。

Speaker recognition method based on Mel frequency cepstrum coefficient and inverted Mel frequency cepstrum coefficient

HU Feng-song,ZHANG Xuan^*

College of Information Science and Engineering,Hunan University,Changsha Hunan 410082,China

Received:2012-03-13 Revised:2012-06-04 Online:2012-09-01 Published:2012-09-01
Contact: Xuan ZHANG

摘要/Abstract

摘要： 为提高说话人识别系统的识别率,提出了基于梅尔频率倒谱系数(MFCC)与翻转梅尔频率倒谱系数(IMFCC)为特征参数的特征提取新方法。该方法利用Fisher准则将MFCC和IMFCC相结合,构造了一种混合特征参数。实验结果表明,新的混合特征参数与MFCC相比,在纯净语音库及噪声环境中均具有较好的识别性能。

关键词: 说话人识别, 梅尔频率倒谱系数, 翻转梅尔频率倒谱系数, Fisher准则, 高斯混合模型

Abstract: To improve the performance of speaker recognition system, a new method of feature extraction was proposed based on Mel Frequency Cepstrum Coefficient (MFCC) and Inverted MFCC (IMFCC). This method constructed a mixed feature by combining MFCC with IMFCC using Fisher criterion. The experimental results show that the mixed feature proposed in this paper has better recognition performance compared with MFCC not only in the pure voice database but also in the noisy environments.

Key words: speaker recognition, Mel Frequency Cepstrum Coefficient (MFCC), Inverted MFCC (IMFCC), Fisher criterion, Gaussian Mixture Model (GMM)

中图分类号:

TN912.34

胡峰松张璇. 基于梅尔频率倒谱系数与翻转梅尔频率倒谱系数的说话人识别方法[J]. 计算机应用, 2012, 32(09): 2542-2544.

HU Feng-song ZHANG Xuan. Speaker recognition method based on Mel frequency cepstrum coefficient and inverted Mel frequency cepstrum coefficient[J]. Journal of Computer Applications, 2012, 32(09): 2542-2544.

参考文献

[1]CAMBELL J P.Speaker recognition:a tutorial [J].Proceedings of the IEEE,1997,185(9):1437-1462.
[2]DAVIS S B,MERMELSTEIN P.Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences [J].IEEE Transactions on Acoustics,Speech and Signal Processing,1980,28(4):357-365.
[3]汪峥,连翰,王建军.说话人识别中特征参数提取的一种新方法[J].复旦学报:自然科学版,2005,44(1):197-200.
[4]于明,袁玉倩,董浩,等.一种基于MFCC和LPCC的文本相关说话人识别方法[J].计算机应用,2006,26(4):883-885.
[5]QIAN ZHEN,LIU LI-YAN,LI XUE-YAO.Speaker identification based on MFCC and IMFCC [C]// ICISE:Proceedings of 2009 the 1st International Conference on Information Science and Engineering.Piscataway,NJ:IEEE Press,2009:5416-5419.
[6]刘丽岩.基于MFCC与IMFCC的说话人识别研究[D].哈尔滨:哈尔滨工程大学,2008.
[7]FISHER R A.The use of multiple measurements in taxonomic problems [J].Annals of Eugenics,1936,7(1):179-188.
[8]ZHU JIAN-WEI,SUN SHUI-FA,DAN ZHI-PING,et al.MFCC extraction based on f-ratio and correlated distance criterion in speaker recognition[C]// MINES'09:Proceedings of the 2009 International Conference on Multimedia Information Networking and Security.Washington,DC:IEEE Computer Society,2009:329-333.
[9]RGOUTAM S,SANDIPAN C,SUMAN S.An f-ratio based optimization technique for automatic speaker recognition system [C]// Proceedings of the IEEE INDICON 2004 India Annual Conference.Piscataway,NJ:IEEE Press,2005:352-355.
[10]廖余.基于混合特征和高斯混合模型的说话人识别研究 [D].昆明:昆明理工大学,2009.
[11]HU YI,LOIZOU P C.Subjective evaluation and comparison of speech enhancement algorithms [J].Speech Communication,2007,49(7/8):588-601.
[12]SANDIPAN C,ANINDYA R,SOURAV M,et al.Capturing complementary information via reversed filter bank and parallel implementation with MFCC for improved text-independent speaker identification[C]// Proceedings of the International Conference on Computing:Theory and Applications.Piscataway,NJ:IEEE Press,2007:463-467.
[13]KUMAR P,JAKHANWAL N,CHANDRA M.Text dependent speaker identification in noisy environment [C]// Proceedings of 2011 International Conference on Devices and Communications (ICDeCom).Piscataway,NJ:IEEE Press,2011:1-4.
[14]REYNOLDS D,ROSE R.Robust text-independent speaker identification using Gaussian mixture speaker models [J].IEEE Transactions Speech and Audio Processing,1995,3(1):72-83.

[1]	陈聿, 田博今, 彭云竹, 廖勇. 联合手肘法和期望最大化的高斯混合聚类电力系统客户分群算法[J]. 计算机应用, 2020, 40(11): 3217-3223.
[2]	牛晓可, 黄伊鑫, 徐华兴, 蒋震阳. 基于听皮层神经元感受野的强噪声环境下说话人识别[J]. 计算机应用, 2020, 40(10): 3034-3040.
[3]	王天锐, 鲍骞月, 秦品乐. 基于梅尔倒谱系数、深层卷积和Bagging的环境音分类方法[J]. 计算机应用, 2019, 39(12): 3515-3521.
[4]	彭磊, 杨秀云, 张裕飞, 李光耀. 基于全局与局部相似性测度的非刚性点集配准[J]. 计算机应用, 2019, 39(10): 3028-3033.
[5]	向立, 严迪群, 王让定, 李孝文. 针对多种处理痕迹的数字语音取证算法[J]. 计算机应用, 2019, 39(1): 126-130.
[6]	林朗, 王让定, 严迪群, 李璨. 基于修正倒谱特征的回放语音检测算法[J]. 计算机应用, 2018, 38(6): 1648-1652.
[7]	喻新荣, 李志华, 闫成雨, 李双俐. 云数据中心高效的虚拟机整合方法[J]. 计算机应用, 2018, 38(2): 550-556.
[8]	陶志勇, 刘晓芳, 王和章. 融合密度峰值的高斯混合模型聚类算法[J]. 计算机应用, 2018, 38(12): 3433-3437.
[9]	陈文兵, 管正雄, 陈允杰. 基于条件生成式对抗网络的数据增强方法[J]. 计算机应用, 2018, 38(11): 3305-3311.
[10]	孙念, 张毅, 林海波, 黄超. 基于多特征i-vector的短语音说话人识别算法[J]. 计算机应用, 2018, 38(10): 2839-2843.
[11]	陈艳, 严腾, 宋俊芳, 宋焕生. 基于高斯混合模型和AdaBoost的夜间车辆检测[J]. 计算机应用, 2018, 38(1): 260-263.
[12]	黄亮, 潘平, 周超. 基于量子隧穿效应的说话人真伪鉴别方法[J]. 计算机应用, 2017, 37(9): 2617-2620.
[13]	马新军, 吴晨晨, 仲乾元, 李园园. 基于SIFT的说话人唇动识别[J]. 计算机应用, 2017, 37(9): 2694-2699.
[14]	李俊山, 杨亚威, 朱子江, 张姣. 基于自然图像块相似性和稀疏先验性的图像复原[J]. 计算机应用, 2017, 37(8): 2319-2323.
[15]	刘晙, 袁培燕, 李永锋. 基于完整可见性模型的改进鲁棒OctoMap[J]. 计算机应用, 2017, 37(5): 1445-1450.

基于梅尔频率倒谱系数与翻转梅尔频率倒谱系数的说话人识别方法

Speaker recognition method based on Mel frequency cepstrum coefficient and inverted Mel frequency cepstrum coefficient

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics