SMFCC:一种新的语音信号特征提取方法

doi:10.11772/j.issn.1001-9081.2016.06.1735

计算机应用 ›› 2016, Vol. 36 ›› Issue (6): 1735-1740.DOI: 10.11772/j.issn.1001-9081.2016.06.1735

SMFCC:一种新的语音信号特征提取方法

汪海彬¹, 余正涛^1,2, 毛存礼^1,2, 郭剑毅^1,2

1. 昆明理工大学信息工程与自动化学院, 昆明 650500;
2. 昆明理工大学智能信息处理重点实验室, 昆明 650500

收稿日期:2015-11-02 修回日期:2016-01-18 发布日期:2016-06-08 出版日期:2016-06-10
通讯作者: 毛存礼
作者简介:汪海彬(1988-),男,安徽马鞍山人,硕士研究生,主要研究方向:语音信号处理、语音识别;余正涛(1970-),男,云南曲靖人,教授,博士,CCF会员,主要研究方向:自然语言处理、信息抽取、语音识别;毛存礼(1977-),男,云南曲靖人,讲师,博士,CCF会员,主要研究方向:自然语言处理、信息抽取、语音识别;郭剑毅(1964-),女,云南曲靖人,教授,硕士,CCF会员,主要研究方向:自然语言处理、信息抽取、语音识别。
基金资助:
国家自然科学基金资助项目(61262041,61472168);云南省自然科学基金重点项目(2013FA030)。

SMFCC: a novel feature extraction method for speech signal

WANG Haibin¹, YU Zhengtao^1,2, MAO Cunli^1,2, GUO Jianyi^1,2

1. Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming Yunnan 650500, China;
2. Intelligent Information Processing Key Laboratory, Kunming University of Science and Technology, Kunming Yunnan 650500, China

Received:2015-11-02 Revised:2016-01-18 Online:2016-06-08 Published:2016-06-10
Supported by:
This work is partially supported by the National Natural Science Foundation of China (61472168, 61262041), the Key Project of National Natural Science Foundation of Yunnan Province (2013FA030).

摘要/Abstract

摘要： 针对说话人识别系统中存在的有效语音特征提取以及噪声影响的问题,提出了一种新的语音特征提取方法——基于S变换的美尔倒谱系数(SMFCC)。该方法是在传统美尔倒谱系数(MFCC)的基础上利用S变换的二维时频多分辨率特性,以及奇异值分解(SVD)方法的二维时频矩阵有效去噪性,并结合相关统计分析方法最终获得语音特征。采用TIMIT语音数据库,将所提的特征和现有特征进行对比实验。SMFCC特征的等错误率(EER)和最小检测代价(MinDCF)均小于线性预测倒谱系数(LPCC)、MFCC及其结合方法LMFCC,比MFCC的EER和MinDCF08分别下降了3.6%与17.9%。实验结果表明所提方法能够有效去除语音信号中的噪声,提升局部分辨率。

关键词: S变换, 奇异值分解, 基于S变换的美尔倒谱系数, 高斯混合模型-通用背景模型, 说话人识别

Abstract: Aiming at the problems of effective feature extraction of speech signal and influence of noise in speaker recognition, a novel method called Mel Frequency Cepstral Coefficients based on S-transform (SMFCC) was proposed for speech feature extraction. The speech features were obtained which were based on traditional Mel Frequency Cepstral Coefficients (MFCC), employed the properties of two-dimensional Time-Frequency (TF) multiresolution in S-transform and effective denoising of two-dimensional TF matrix with Singular Value Decomposition (SVD) algorithm, and combined with other related statistic methods. Based on the TIMIT corpus, the extracted features were compared with the current features by the experiment. The Equal Error Rate (EER) and Minimum Detection Cost Function (MinDCF) of SMFCC were smaller than those of Linear Prediction Cepstral Coefficient (LPCC), MFCC, and LMFCC; especially, the EER and MinDCF08 of SMFCC were decreased by 3.6% and 17.9% respectively compared to MFCC.The experimental results show that the proposed method can eliminate the noise in the speech signal effectively and improve local speech signal feature resolution.

Key words: S-transform, Singular Value Decomposition (SVD), Mel Frequency Cepstral Coefficients based on S-transform (SMFCC), Gaussian Mixture Model-Universal Background Model (GMM-UBM), speaker recognition

中图分类号:

TN912.34

汪海彬, 余正涛, 毛存礼, 郭剑毅. SMFCC:一种新的语音信号特征提取方法[J]. 计算机应用, 2016, 36(6): 1735-1740.

WANG Haibin, YU Zhengtao, MAO Cunli, GUO Jianyi. SMFCC: a novel feature extraction method for speech signal[J]. Journal of Computer Applications, 2016, 36(6): 1735-1740.

参考文献

[1] REYNOLDS D A. An overview of automatic speaker recognition technology[C]//Proceedings of the 2002 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway, NJ: IEEE, 2002: IV-4072-IV-4075.
[2] ATAL B S. Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification[J]. Journal of the Acoustical Society of America, 1974, 55(6): 1304-1322.
[3] CAMPBELL J P. Speaker recognition: a tutorial[J]. Proceedings of the IEEE, 1997, 85(9): 1437-1462.
[4] ATAL B S, HANAUER S L. Speech analysis and synthesis by linear prediction of speech wave[J]. Journal of the Acoustical Society of America, 1971, 50(2): 637-655.
[5] JING X X, MA J L, ZHAO J, et al. Speaker recognition based on principal component analysis of LPCC and MFCC[C]//Proceedings of the 2014 IEEE International Conference on Signal Processing, Communications and Computing. Piscataway, NJ: IEEE, 2014: 403-408.
[6] DEHAK N, KENNY P J, DEHAK R, et al. Front-end factor analysis for speaker verification[J]. IEEE Transactions on Acoustics, Speech and Signal Processing, 2011, 19(4): 788-798.
[7] DAVIS S B, MERMELSTEIN P. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences[J]. IEEE Transactions on Acoustics, Speech and Signal Processing, 1980, 28(4): 357-366.
[8] MUBARAK O M, AMBIKAIRAJAH E, EPPS J. Analysis of an MFCC-based audio indexing system for efficient coding of multimedia sources[C]//Proceedings of the 8th International Symposium on Signal Processing and Its Applications. Piscataway, NJ: IEEE, 2005: 619-622.
[9] NIST. The NIST speaker recognition evaluation[EB/OL].[2015-10-10]. http://www.itl.nist.gov/iad/mig//tests/sre/2010/index.html.
[10] NGHIA P T, BINH P V, THAI N H, et al. A robust wavelet-based text-independent speaker identification[C]//Proceedings of the 2007 International Conference on Computational Intelligence and Multimedia Applications. Piscataway, NJ: IEEE, 2007: 219-223.
[11] KUMAR P, CHANDRA M. Hybrid of wavelet and MFCC features for speaker verification[C]//Proceedings of the 2011 World Congress on Information and Communication Technologies. Piscataway, NJ: IEEE, 2011: 1150-1154.
[12] AI O C, HARIHARAN M, YAACOB S, et al. Classification of speech dysfluencies with MFCC and LPCC features[J]. Expert Systems with Applications, 2012, 39(2): 2157-2165.
[13] YUAN Y J, ZHAO P H, ZHOU Q. Research of speaker recognition based on combination of LPCC and MFCC[C]//Proceedings of the 2010 IEEE International Conference on Intelligent Computing and Intelligent Systems. Piscataway, NJ: IEEE, 2010: 765-767.
[14] ZHOU H L, WANG J, WANG M C, et al. Amplitude spectrum compensation and phase spectrum correction of seismic data based on the generalized S transform[J]. Applied Geophysics, 2014,11(4): 468-478.
[15] HUANG N T, ZHANG S X, CAI G W, et al. Power quality disturbances recognition based on a multiresolution generalized S-transform and a PSO-improved decision tree[J]. Energies, 2015, 8(1): 549-572.
[16] HARIHARAN M, VIJEAN V, SINDHU R, et al. Classification of mental tasks using Stockwell transform[J]. Computers and Electrical Engineering, 2014, 40(5): 1741-1749.
[17] MOUKADEM A, DIETERLEN A, HUEBER N, et al. A robust heart sounds segmentation module based on S-transform[J]. Biomedical Signal Processing and Control, 2012, 8(3): 273-281.
[18] YIN B Q, HE Y G, WU X M. A method for magnetocardiograms filtering based on singular value decomposition and S-transform[J]. Acta Physica Sinica, 2013, 62(14): 148702.
[19] GUO Y J, WEI Y D, ZHOU X J, et al. Impact feature extracting method based on S transform time-frequency spectrum denoised by SVD[J]. Journal of Vibration Engineering, 2014, 27(4): 621-628.
[20] STOCKWELL R G, MANSINHA L, LOWE R P. Localization of the complex spectrum: the S transform[J]. IEEE Transactions on Signal Processing, 1996, 44(4): 998-1001.
[21] STOCKWELL R G. Why use the S-transform?[EB/OL].[2015-10-17]. https://bytebucket.org/cleemesser/stockwelltransform/raw/d87ff20d787d36d5280dcd26cbaf309dcd982bf4/ref/Stockwell-Why%20Use%20the%20S-Transform.pdf
[22] CONG F Y, ZHONG W, TONG S G, et al. Research of singular value decomposition based on slip matrix for rolling bearing fault diagnosis[J]. Journal of Sound and Vibration, 2015, 344: 447-463.
[23] YANG W X, TSE P W. Development of an advanced noise reduction method for vibration analysis based on singular value decomposition[J]. NDT&E International, 2003, 36(6): 419-432.
[24] JANKOWSKI C, KALYANSWAMY A, BASSON S, et al. NTIMIT: a phonetically balanced, continuous speech, telephone bandwidth speech database[C]//Proceedings of the 1990 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway, NJ: IEEE, 1990: 109-112.
[25] SADJADI S O, SLANEY M, HECK L. MSR Identity Toolbox v1.0: a Matlab toolbox for speaker recognition research[EB/OL].[2015-10-17]. http://research.microsoft.com/en-us/downloads/2476c44a-1f63-4fe0-b805-8c2de395bb2c/.
[26] LI Q, HUANG Y. An auditory-based feature extraction algorithm for robust speaker identification under mismatched conditions[J]. IEEE Transaction on Audio, Speech, and Language Processing, 2011, 19(6): 1791-1801.
[27] LI Q, HUANG Y. Robust speaker identification using an auditory-based feature[C]//Proceedings of the 2010 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway, NJ: IEEE, 2010: 4514-4517.
[28] 李作强,高勇.基于CFCC和相位信息的鲁棒性说话人辨识[J].计算机工程与应用,2015,51(17):228-232.(LI Z Q, GAO Y. Robust speaker identification based on CFCC and phase information[J]. Computer Engineering and Applications, 2015, 51(17): 228-232.)

SMFCC:一种新的语音信号特征提取方法

SMFCC: a novel feature extraction method for speech signal

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	景维鹏, 肖庆欣, 罗辉. 基于概率球面判别分析的说话人识别信道补偿算法[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 556-562.
[2]	金柯君, 于洪涛, 吴翼腾, 李邵梅, 张建朋, 郑洪浩. 改进的基于奇异值分解的图卷积网络防御方法[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1511-1517.
[3]	李振亮, 李波. 基于矩阵分解的卷积神经网络改进方法[J]. 《计算机应用》唯一官方网站, 2023, 43(3): 685-691.
[4]	方昕, 黄泽鑫, 张聿晗, 高天, 潘嘉, 付中华, 高建清, 刘俊华, 邹亮. 基于时域波形的半监督端到端虚假语音检测方法[J]. 《计算机应用》唯一官方网站, 2023, 43(1): 227-231.
[5]	肖如良, 曾智霞, 肖晨凯, 张仕. 基于局部敏感布隆过滤器的工业物联网隐性异常检测[J]. 《计算机应用》唯一官方网站, 2021, 41(12): 3620-3625.
[6]	牛晓可, 黄伊鑫, 徐华兴, 蒋震阳. 基于听皮层神经元感受野的强噪声环境下说话人识别[J]. 计算机应用, 2020, 40(10): 3034-3040.
[7]	吴婕, 吕永乐. 基于多项式系数自回归模型的雷达性能参数最优组合预测[J]. 计算机应用, 2019, 39(4): 1117-1121.
[8]	邱宁佳, 丛琳, 周思丞, 王鹏, 李岩芳. 结合改进主动学习的SVD-CNN弹幕文本分类算法[J]. 计算机应用, 2019, 39(3): 644-650.
[9]	杜凯敏, 康宝生. 基于图像块分类的图像超分辨率重建[J]. 计算机应用, 2019, 39(2): 577-581.
[10]	汤慧, 周明全, 耿国华. 基于区域分割的低覆盖点云配准算法[J]. 计算机应用, 2019, 39(11): 3355-3360.
[11]	周瑞环, 赵宏宇. 结合物品流行度的列表级矩阵因子分解算法[J]. 计算机应用, 2018, 38(7): 1877-1881.
[12]	王丽芳, 董侠, 秦品乐, 高媛. 基于自适应联合字典学习的脑部多模态图像融合方法[J]. 计算机应用, 2018, 38(4): 1134-1140.
[13]	李周, 崔琛. 基于奇异值分解的压缩感知观测矩阵优化算法[J]. 计算机应用, 2018, 38(2): 568-572.
[14]	孙念, 张毅, 林海波, 黄超. 基于多特征i-vector的短语音说话人识别算法[J]. 计算机应用, 2018, 38(10): 2839-2843.
[15]	张弢, 康缘, 任帅, 柳雨农. 基于压缩感知和GHM多小波变换的信息隐藏算法[J]. 计算机应用, 2017, 37(9): 2581-2584.