计算机应用 ›› 2016, Vol. 36 ›› Issue (6): 1735-1740.DOI: 10.11772/j.issn.1001-9081.2016.06.1735

• 虚拟现实与数字媒体 • 上一篇    下一篇

SMFCC:一种新的语音信号特征提取方法

汪海彬1, 余正涛1,2, 毛存礼1,2, 郭剑毅1,2   

  1. 1. 昆明理工大学 信息工程与自动化学院, 昆明 650500;
    2. 昆明理工大学 智能信息处理重点实验室, 昆明 650500
  • 收稿日期:2015-11-02 修回日期:2016-01-18 出版日期:2016-06-10 发布日期:2016-06-08
  • 通讯作者: 毛存礼
  • 作者简介:汪海彬(1988-),男,安徽马鞍山人,硕士研究生,主要研究方向:语音信号处理、语音识别;余正涛(1970-),男,云南曲靖人,教授,博士,CCF会员,主要研究方向:自然语言处理、信息抽取、语音识别;毛存礼(1977-),男,云南曲靖人,讲师,博士,CCF会员,主要研究方向:自然语言处理、信息抽取、语音识别;郭剑毅(1964-),女,云南曲靖人,教授,硕士,CCF会员,主要研究方向:自然语言处理、信息抽取、语音识别。
  • 基金资助:
    国家自然科学基金资助项目(61262041,61472168);云南省自然科学基金重点项目(2013FA030)。

SMFCC: a novel feature extraction method for speech signal

WANG Haibin1, YU Zhengtao1,2, MAO Cunli1,2, GUO Jianyi1,2   

  1. 1. Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming Yunnan 650500, China;
    2. Intelligent Information Processing Key Laboratory, Kunming University of Science and Technology, Kunming Yunnan 650500, China
  • Received:2015-11-02 Revised:2016-01-18 Online:2016-06-10 Published:2016-06-08
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61472168, 61262041), the Key Project of National Natural Science Foundation of Yunnan Province (2013FA030).

摘要: 针对说话人识别系统中存在的有效语音特征提取以及噪声影响的问题,提出了一种新的语音特征提取方法——基于S变换的美尔倒谱系数(SMFCC)。该方法是在传统美尔倒谱系数(MFCC)的基础上利用S变换的二维时频多分辨率特性,以及奇异值分解(SVD)方法的二维时频矩阵有效去噪性,并结合相关统计分析方法最终获得语音特征。采用TIMIT语音数据库,将所提的特征和现有特征进行对比实验。SMFCC特征的等错误率(EER)和最小检测代价(MinDCF)均小于线性预测倒谱系数(LPCC)、MFCC及其结合方法LMFCC,比MFCC的EER和MinDCF08分别下降了3.6%与17.9%。实验结果表明所提方法能够有效去除语音信号中的噪声,提升局部分辨率。

关键词: S变换, 奇异值分解, 基于S变换的美尔倒谱系数, 高斯混合模型-通用背景模型, 说话人识别

Abstract: Aiming at the problems of effective feature extraction of speech signal and influence of noise in speaker recognition, a novel method called Mel Frequency Cepstral Coefficients based on S-transform (SMFCC) was proposed for speech feature extraction. The speech features were obtained which were based on traditional Mel Frequency Cepstral Coefficients (MFCC), employed the properties of two-dimensional Time-Frequency (TF) multiresolution in S-transform and effective denoising of two-dimensional TF matrix with Singular Value Decomposition (SVD) algorithm, and combined with other related statistic methods. Based on the TIMIT corpus, the extracted features were compared with the current features by the experiment. The Equal Error Rate (EER) and Minimum Detection Cost Function (MinDCF) of SMFCC were smaller than those of Linear Prediction Cepstral Coefficient (LPCC), MFCC, and LMFCC; especially, the EER and MinDCF08 of SMFCC were decreased by 3.6% and 17.9% respectively compared to MFCC.The experimental results show that the proposed method can eliminate the noise in the speech signal effectively and improve local speech signal feature resolution.

Key words: S-transform, Singular Value Decomposition (SVD), Mel Frequency Cepstral Coefficients based on S-transform (SMFCC), Gaussian Mixture Model-Universal Background Model (GMM-UBM), speaker recognition

中图分类号: