Journal of Computer Applications ›› 2018, Vol. 38 ›› Issue (10): 2839-2843.DOI: 10.11772/j.issn.1001-9081.2018030598

Previous Articles     Next Articles

Short utterance speaker recognition algorithm based on multi-featured i-vector

SUN Nian1, ZHANG Yi1, LIN Haibo2, HUANG Chao1   

  1. 1. School of Advanced Manufacturing Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China;
    2. School of Automation, University of Posts and Telecommunications, Chongqing 400065, China
  • Received:2018-03-23 Revised:2018-05-30 Online:2018-10-10 Published:2018-10-13
  • Supported by:
    This work is partially supported by the Chongqing Research Special Project of Basic Science and Frontier Technology (cstc2015jcyjBX0066).

基于多特征i-vector的短语音说话人识别算法

孙念1, 张毅1, 林海波2, 黄超1   

  1. 1. 重庆邮电大学 先进制造工程学院, 重庆 400065;
    2. 重庆邮电大学 自动化学院, 重庆 400065
  • 通讯作者: 孙念
  • 作者简介:孙念(1993-),女,重庆人,硕士研究生,主要研究方向:说话人识别;张毅(1966-),男,重庆人,教授,博士,主要研究方向:机器人人机交互、误差理论;林海波(1965-),男,重庆人,副教授,硕士,主要研究方向:机器人、自动控制、模式识别;黄超(1982-),重庆人,讲师,博士,主要研究方向:高性能机电传动系统、智能机器人。
  • 基金资助:
    重庆市基础科学与前沿技术研究专项重点项目(cstc2015jcyjBX0066)。

Abstract: When the length of the test speech is sufficient, the information and discrimination of single feature is sufficient to complete the speaker recognition task. However, when the length of the test speech was very short, the performance of speaker recognition is decreased significantly due to the small data size and insufficient discrimination. Aiming at the problem of insufficient speaker information under the short speech condition, a short utterance speaker recognition algorithm based on multi-featured i-vector was proposed. Firstly, different acoustic feature vectors were extracted and combined into a high-dimensional feature vector. Then Principal Component Analysis (PCA) was used to remove the correlation of the feature vectors, so that the features were orthogonalized. Finally, the most discriminating features were picked out by Linear Discriminant Analysis (LDA), which led to reduce the spatial dimension. Therefore, this multi-featured system can achieve a better speaker recognition performance. With the TIMIT corpus under the same short speech (2 s) condition, the experimental results showed that the Equal Error Rate (EER) of the multi-featured system decreased respectively by 72.16%, 69.47% and 73.62% compared with the single-featured systems including Mel-Frequency Cepstrum Coefficient (MFCC), Linear Prediction Cepstrum Coefficient (LPCC) and Perceptual Log Area Ratio (PLAR) based on i-vector. For the different lengths of the short speech, the proposed algorithm provided rough 50% improvement on EER and Detection Cost Function (DCF) compared with the single-featured system based on i-vector. Experimental results fully indicate that the multi-featured system can make full use of the speaker's characteristic information in the short utterance speaker recognition, and improves the speaker recognition performance.

Key words: speaker recognition, i-vector, short utterance, multi-feature, Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA)

摘要: 当测试语音时长充足时,单一特征的信息量和区分性足够完成说话人识别任务,但是在测试语音很短的情况下,语音信号里缺乏充分的说话人信息,使得说话人识别性能急剧下降。针对短语音条件下的说话人信息不足的问题,提出一种基于多特征i-vector的短语音说话人识别算法。该算法首先提取不同的声学特征向量组合成一个高维特征向量,然后利用主成分分析(PCA)去除高维特征向量的相关性,使特征之间正交化,最后采用线性判别分析(LDA)挑选出最具区分性的特征,并且在一定程度上降低空间维度,从而实现更好的说话人识别性能。结合TIMIT语料库进行实验,同一时长的短语音(2 s)条件下,所提算法比基于i-vector的单一的梅尔频率倒谱系数(MFCC)、线性预测倒谱系数(LPCC)、感知对数面积比系数(PLAR)特征系统在等错误率(EER)上分别有相对72.16%、69.47%和73.62%的下降。不同时长的短语音条件下,所提算法比基于i-vector的单一特征系统在EER和检测代价函数(DCF)上大致都有50%的降低。基于以上两种实验的结果充分表明了所提算法在短语音说话人识别系统中可以充分提取说话人的个性信息,有利地提高说话人识别性能。

关键词: 说话人识别, i-vector, 短语音, 多特征, 主成分分析, 线性判别分析

CLC Number: