计算机应用 ›› 2011, Vol. 31 ›› Issue (08): 2083-2086.DOI: 10.3724/SP.J.1087.2011.02083

• 人工智能 • 上一篇    下一篇

基于线性对数似然核函数的说话人识别

何亮,刘加   

  1. 清华大学 电子工程系 清华科学与技术实验室,北京100084
  • 收稿日期:2011-01-24 修回日期:2011-03-02 发布日期:2011-08-01 出版日期:2011-08-01
  • 通讯作者: 何亮
  • 作者简介:何亮(1981-),男,辽宁锦州人,博士研究生,主要研究方向:说话人识别、语种识别;刘加(1954-),男,福建福州人,教授,博士生导师,主要研究方向:语音识别、信号处理。
  • 基金资助:

    国家自然科学基金资助项目(90920302);国家自然科学基金资助项目(61005019);国家863计划项目(2008AA040201)

Speaker recognition based on linear log-likelihood kernel function

Liang HE,Jia LIU   

  1. Tsinghua National Laboratory for Information Science and Technology, Department of Electronic Engineering, Tsinghua University, Beijing 100084, China
  • Received:2011-01-24 Revised:2011-03-02 Online:2011-08-01 Published:2011-08-01
  • Contact: Liang HE

摘要: 为了提高文本无关的说话人识别系统的性能,提出了基于线性对数似然核函数的说话人识别系统。线性对数似然核函数利用高斯混合模型对频谱特征序列进行压缩;将频谱特征序列之间的相似程度转化为高斯混合模型参数之间的距离;根据距离表达式,利用极化恒等式求得频谱特征序列向高维矢量空间的映射方法;最后,在高维矢量空间,采用支持向量机(SVM)为目标说话人建立模型。在美国国家标准技术署公布的说话人识别数据库上的实验结果表明,所提核函数具有优异的识别性能。

关键词: 说话人识别, 核方法, 支持向量机, 高斯混合模型, 对数似然

Abstract: To improve the performance of a text-independent speaker recognition system, the authors proposed a speaker recognition system based on linear log-likelihood kernel function. The linear log-likelihood kernel compressed the input cepstrum feature sequence of a speaker model by a Gaussian mixture model. The log-likelihood between two utterances was simplified to the distance between the parameters of Gaussian mixture model. Polarization identity was applied to obtain the mapping from a cepstrum feature sequence to a high dimension vector. Support Vector Machine (SVM) was used to train speaker models. The experimental results on National Institute of Standard and Technology show that the proposed kernel has excellent performance.

Key words: speaker recognition, kernel method, Support Vector Machine (SVM), Gaussian Mixture Model (GMM), log-likelihood

中图分类号: