计算机应用 ›› 2016, Vol. 36 ›› Issue (5): 1421-1425.DOI: 10.11772/j.issn.1001-9081.2016.05.1421

• 虚拟现实与数字媒体 • 上一篇    下一篇

说话人识别中基于Fisher比的特征组合方法

谢小娟, 曾以成, 熊冰峰   

  1. 湘潭大学 物理与光电工程学院, 湖南 湘潭 411105
  • 收稿日期:2015-09-29 修回日期:2016-01-11 出版日期:2016-05-10 发布日期:2016-05-09
  • 通讯作者: 谢小娟
  • 作者简介:谢小娟(1989-),女,湖南衡阳人,硕士研究生,主要研究方向:语音信号处理;曾以成(1962-),男,湖南涟源人,教授,博士,主要研究方向:语音信号处理;熊冰峰(1991-),男,湖南华容人,硕士研究生,主要研究方向:语音信号处理。
  • 基金资助:
    国家自然科学基金资助项目(61471310)。

Feature combination method based on Fisher criterion in speaker recognition

XIE Xiaojuan, ZENG Yicheng, XIONG Bingfeng   

  1. School of Physics and Optoelectric Engineering, Xiangtan University, Xiangtan Hunan 411105, China
  • Received:2015-09-29 Revised:2016-01-11 Online:2016-05-10 Published:2016-05-09
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61471310).

摘要: 为了提高说话人识别的准确率,可以同时采用多个特征参数,针对综合特征参数中各维分量对识别结果的影响可能不一样,同等对待并不一定是最优的方案这个问题,提出基于Fisher准则的梅尔频率倒谱系数(MFCC)、线性预测梅尔倒谱系数(LPMFCC)、Teager能量算子倒谱参数(TEOCC)相混合的特征参数提取方法。首先,提取语音信号的MFCC、LPMFCC和TEOCC三种参数;然后,计算MFCC和LPMFCC参数中各维分量的Fisher比,分别选出六个Fisher比高的分量与TEOCC参数组合成混合特征参数;最后,采用TIMIT语音库和NOISEX-92噪声库进行说话人识别实验。仿真实验表明,所提方法与MFCC、LPMFCC、MFCC+LPMFCC、基于Fisher比的梅尔倒谱系数混合特征提取方法以及基于主成分分析(PCA)的特征抽取方法相比,在采用高斯混合模型(GMM)和BP神经网络的平均识别率在纯净语音环境下分别提高了21.65个百分点、18.39个百分点、15.61个百分点、15.01个百分点与22.70个百分点;在30 dB噪声环境下,则分别提升了15.15个百分点、10.81个百分点、8.69个百分点、7.64个百分点与17.76个百分点。实验结果表明,该混合特征参数能够有效提高说话人识别率,且具有更好的鲁棒性。

关键词: 说话人识别, Fisher准则, 梅尔频率倒谱系数, 线性预测系数, Teager能量算子

Abstract: In order to improve the accuracy of speaker recognition, multiple feature parameters should be adopted simultaneously. For the problem that each dimension comprehensive feature parameter has the different influence on the identification result, and treating them equally may not be the optimal solution, a feature parameter extraction method based on Fisher criterion combined with Mel Frequency Cepstrum Coefficient (MFCC), Linear Prediction Mel Frequency Cepstrum Coefficient (LPMFCC) and Teager Energy Operators Cepstrum Coefficient (TEOCC) was proposed. Firstly, parameters of MFCC, LPMFCC and TEOCC from speech signals were extracted, and then the Fisher ratio of each dimension of MFCC and LPMFCC parameters was calculated, six components were selected respectively by using Fisher standard to combine with TEOCC parameter into a mixture feature which was used to realize speaker recognition on the TIMIT acoustic-phonetic continuous speech corpus and NOISEX-92 noise library. The simulation results show that the average recognition rate of the proposed method by using Gauss Mixed Model (GMM) and Back Propagation (BP) neural network compared with MFCC, LPMFCC, MFCC+LPMFCC, parameter extraction method for MFCC based on Fisher criterion and the feature extraction method based on Principal Component Analysis (PCA) is increased by 21.65 percentage points, 18.39 percentage points, 15.61 percentage points, 15.01 percentage points, 22.70 percentage points in the pure voice database, and by 15.15 percentage points, 10.81 percentage points, 8.69 percentage points, 7.64 percentage points, 17.76 percentage points in 30 dB noise environments. The results show that the mixture feature can improve the recognition rate effectively and has better robustness.

Key words: speaker recognition, Fisher criterion, Mel Frequency Cepstrum Coefficent (MFCC), Linear Prediction Coefficient (LPC), Teager Energy Operator (TEO)

中图分类号: