《计算机应用》唯一官方网站 ›› 2024, Vol. 44 ›› Issue (2): 556-562.DOI: 10.11772/j.issn.1001-9081.2023020157

• 多媒体计算与计算机仿真 • 上一篇    

基于概率球面判别分析的说话人识别信道补偿算法

景维鹏, 肖庆欣, 罗辉()   

  1. 东北林业大学 信息与计算机工程学院,哈尔滨 150006
  • 收稿日期:2023-02-21 修回日期:2023-04-18 接受日期:2023-04-21 发布日期:2023-08-14 出版日期:2024-02-10
  • 通讯作者: 罗辉
  • 作者简介:景维鹏(1979—),男,黑龙江鹤岗人,教授,博士,CCF高级会员,主要研究方向:人工智能
    肖庆欣(1999—),男,山东泰安人,硕士研究生,主要研究方向:说话人识别;
  • 基金资助:
    国家自然科学基金资助项目(62101114)

Channel compensation algorithm for speaker recognition based on probabilistic spherical discriminant analysis

Weipeng JING, Qingxin XIAO, Hui LUO()   

  1. School of Information and Computer Engineering,Northeast Forestry University,Harbin Heilongjiang 150006,China
  • Received:2023-02-21 Revised:2023-04-18 Accepted:2023-04-21 Online:2023-08-14 Published:2024-02-10
  • Contact: Hui LUO
  • About author:JING Weipeng, born in 1979, Ph. D., professor. His research interests include artificial intelligence.
    XIAO Qingxin, born in 1999, M. S. candidate. His research interests include speaker recognition.
  • Supported by:
    National Natural Science Foundation of China(62101114)

摘要:

在说话人识别任务中,概率线性判别分析(PLDA)模型是目前常用的分类后端,但由于高斯PLDA模型分布假设不能准确拟合真实说话人特征分布,导致基于高斯分布假设长度归一化的信道补偿方法会破坏说话人特征类内分布的独立性,使得高斯PLDA不能充分利用上游任务提取特征所包含的说话人信息,从而影响识别结果。针对这一问题,提出基于概率球面判别分析的信道补偿算法(CC-PSDA),通过引入冯·米塞斯-费希尔(VMF)分布假设的概率球面判别分析模型(PSDA)和特征变换方法代替高斯分布假设的概率线性判别分析方法,以避免信道补偿对说话人特征类内分布独立性的影响。首先,为了使说话人特征符合VMF分布先验假设拟合后端分类模型,在特征级利用非线性转换对说话人特征进行分布变换。之后,利用基于VMF分布假设的PLDA模型不会破坏说话人特征的类内分布结构的特点,将变换后的说话人特征定义到特定维度的超球面,最大化特征类间距离。所提算法通过期望最大化(EM)算法进行求解,最终完成分类任务。实验结果表明,改进算法在三个测试集上的识别等错误率相较于对比模型PSDA、高斯PLDA均最低。由此可见,所提模型可以有效区分说话人特征,提高识别性能。

关键词: 说话人识别, i-vector, 概率球面判别分析, 信道补偿, 冯·米塞斯-费希尔分布, 长度归一化

Abstract:

In speaker recognition tasks, the Probabilistic Linear Discriminant Analysis (PLDA) model is a commonly used classification backend. However, due to the inaccurate fitting of the real speaker feature distribution by the distribution assumption of Gaussian PLDA model, length normalization-based channel compensation methods based on the Gaussian distribution assumption may destroy the independence of the within-class distribution of speaker features, making the Gaussian PLDA unable to fully utilize the speaker information contained in the upstream task feature extraction, thereby affecting the recognition results. To address this issue, a Channel Compensation algorithm for speaker recognition based on Probabilistic Spherical Discriminant Analysis(CC-PSDA) was proposed, which introduced a Probabilistic Spherical Discriminant Analysis (PSDA) model with Von Mises-Fisher (VMF) distribution assumption and a feature transformation method to replace the PLDA method based on the Gaussian distribution assumption, for avoiding the impact of channel compensation on the independence of the within-class distribution of speaker features. Firstly,in order to make the speaker features conform to the VMF distribution prior assumption and fit the backend classification model,a nonlinear transformation was used to transform the distribution of the speaker features at the feature level. Then, by utilizing the characteristic of the PSDA model based on the VMF distribution assumption that does not destroy the within-class distribution structure of speaker features, the transformed speaker features were defined on a hypersphere of a specific dimension, maximizing the inter-class distance of features. The proposed model was solved by the EM (Expectation Maximum) algorithm, and the classification task was ultimately completed. Experimental results show that the improved algorithm has the lowest recognition equal error rates compared to the PSDA and Gaussian PLDA models on three test sets. Therefore, the proposed algorithm can effectively distinguish speaker features and improve recognition performance.

Key words: speaker recognition, i-vector, Probabilistic Spherical Discriminant Analysis (PSDA), channel compensation, Von Mises- Fisher (VMF) distribution, length normalization

中图分类号: