基于概率球面判别分析的说话人识别信道补偿算法

doi:10.11772/j.issn.1001-9081.2023020157

《计算机应用》唯一官方网站 ›› 2024, Vol. 44 ›› Issue (2): 556-562.DOI: 10.11772/j.issn.1001-9081.2023020157

所属专题：多媒体计算与计算机仿真

• 多媒体计算与计算机仿真 • 上一篇下一篇

基于概率球面判别分析的说话人识别信道补偿算法

景维鹏, 肖庆欣, 罗辉()

东北林业大学信息与计算机工程学院，哈尔滨 150006

收稿日期:2023-02-21 修回日期:2023-04-18 接受日期:2023-04-21 发布日期:2023-08-14 出版日期:2024-02-10
通讯作者: 罗辉
作者简介:景维鹏（1979—），男，黑龙江鹤岗人，教授，博士，CCF高级会员，主要研究方向：人工智能
肖庆欣（1999—），男，山东泰安人，硕士研究生，主要研究方向：说话人识别；
基金资助:
国家自然科学基金资助项目(62101114)

Channel compensation algorithm for speaker recognition based on probabilistic spherical discriminant analysis

Weipeng JING, Qingxin XIAO, Hui LUO()

School of Information and Computer Engineering，Northeast Forestry University，Harbin Heilongjiang 150006，China

Received:2023-02-21 Revised:2023-04-18 Accepted:2023-04-21 Online:2023-08-14 Published:2024-02-10
Contact: Hui LUO
About author:JING Weipeng， born in 1979， Ph. D.， professor. His research interests include artificial intelligence.
XIAO Qingxin， born in 1999， M. S. candidate. His research interests include speaker recognition.
Supported by:
National Natural Science Foundation of China(62101114)

摘要/Abstract

摘要：

在说话人识别任务中，概率线性判别分析（PLDA）模型是目前常用的分类后端，但由于高斯PLDA模型分布假设不能准确拟合真实说话人特征分布，导致基于高斯分布假设长度归一化的信道补偿方法会破坏说话人特征类内分布的独立性，使得高斯PLDA不能充分利用上游任务提取特征所包含的说话人信息，从而影响识别结果。针对这一问题，提出基于概率球面判别分析的信道补偿算法（CC-PSDA），通过引入冯·米塞斯-费希尔（VMF）分布假设的概率球面判别分析模型（PSDA）和特征变换方法代替高斯分布假设的概率线性判别分析方法，以避免信道补偿对说话人特征类内分布独立性的影响。首先，为了使说话人特征符合VMF分布先验假设拟合后端分类模型，在特征级利用非线性转换对说话人特征进行分布变换。之后，利用基于VMF分布假设的PLDA模型不会破坏说话人特征的类内分布结构的特点，将变换后的说话人特征定义到特定维度的超球面，最大化特征类间距离。所提算法通过期望最大化（EM）算法进行求解，最终完成分类任务。实验结果表明，改进算法在三个测试集上的识别等错误率相较于对比模型PSDA、高斯PLDA均最低。由此可见，所提模型可以有效区分说话人特征，提高识别性能。

关键词: 说话人识别, i-vector, 概率球面判别分析, 信道补偿, 冯·米塞斯-费希尔分布, 长度归一化

Abstract:

In speaker recognition tasks， the Probabilistic Linear Discriminant Analysis （PLDA） model is a commonly used classification backend. However， due to the inaccurate fitting of the real speaker feature distribution by the distribution assumption of Gaussian PLDA model， length normalization-based channel compensation methods based on the Gaussian distribution assumption may destroy the independence of the within-class distribution of speaker features， making the Gaussian PLDA unable to fully utilize the speaker information contained in the upstream task feature extraction， thereby affecting the recognition results. To address this issue， a Channel Compensation algorithm for speaker recognition based on Probabilistic Spherical Discriminant Analysis（CC-PSDA） was proposed， which introduced a Probabilistic Spherical Discriminant Analysis （PSDA） model with Von Mises-Fisher （VMF） distribution assumption and a feature transformation method to replace the PLDA method based on the Gaussian distribution assumption， for avoiding the impact of channel compensation on the independence of the within-class distribution of speaker features. Firstly，in order to make the speaker features conform to the VMF distribution prior assumption and fit the backend classification model，a nonlinear transformation was used to transform the distribution of the speaker features at the feature level. Then， by utilizing the characteristic of the PSDA model based on the VMF distribution assumption that does not destroy the within-class distribution structure of speaker features， the transformed speaker features were defined on a hypersphere of a specific dimension， maximizing the inter-class distance of features. The proposed model was solved by the EM （Expectation Maximum） algorithm， and the classification task was ultimately completed. Experimental results show that the improved algorithm has the lowest recognition equal error rates compared to the PSDA and Gaussian PLDA models on three test sets. Therefore， the proposed algorithm can effectively distinguish speaker features and improve recognition performance.

Key words: speaker recognition, i-vector, Probabilistic Spherical Discriminant Analysis (PSDA), channel compensation, Von Mises- Fisher (VMF) distribution, length normalization

中图分类号:

TP391.4

景维鹏, 肖庆欣, 罗辉. 基于概率球面判别分析的说话人识别信道补偿算法[J]. 计算机应用, 2024, 44(2): 556-562.

Weipeng JING, Qingxin XIAO, Hui LUO. Channel compensation algorithm for speaker recognition based on probabilistic spherical discriminant analysis[J]. Journal of Computer Applications, 2024, 44(2): 556-562.

图/表 9

参考文献 26

1	陈晨，韩纪庆.说话人识别方法综述［J］.智能计算机与应用，2015， 5（5）：92-94，97. 10.3969/j.issn.2095-2163.2015.05.027
	CHEN C， HAN J Q. An overview of speaker recognition ［J］. Intelligent Computer and Applications， 2015， 5（5）：92-94，97. 10.3969/j.issn.2095-2163.2015.05.027
2	MATEJU L， KYNYCH F， CERVA P， et al. Overlapped speech detection in broadcast streams using x-vectors［C］// Proceedings of the Interspeech 2022. Baixas， France：International Speech Communication Association， 2022： 4606-4610. 10.21437/interspeech.2022-81
3	DEHAK N， KENNY P J， DEHAK R， et al. Front-end factor analysis for speaker verification［J］. IEEE Transactions on Audio， Speech， and Language Processing， 2010， 19（4）： 788-798. 10.1109/tasl.2010.2064307
4	SNYDER D， GARCIA-ROMERO D， SELL G， et al. X-vectors： Robust DNN embeddings for speaker recognition［C］// Proceedings of the 2018 IEEE International Conference on Acoustics， Speech and Signal Processing. Piscataway： IEEE， 2018： 5329-5333. 10.1109/icassp.2018.8461375
5	PRINCE S J D， ELDER J H. Probabilistic linear discriminant analysis for inferences about identity［C］// Proceedings of the 2007 IEEE 11th International Conference on Computer Vision. Piscataway： IEEE， 2007： 1-8. 10.1109/iccv.2007.4409052
6	GARCIA-ROMERO D， ZHOU X， ESPY-WILSON C Y. Multicondition training of Gaussian PLDA models in i-vector space for noise and reverberation robust speaker recognition［C］// Proceedings of the 2012 IEEE International Conference on Acoustics， Speech and Signal Processing. Piscataway： IEEE， 2012： 4257-4260. 10.1109/icassp.2012.6288859
7	CAI W， CHEN J， LI M. Analysis of length normalization in end-to-end speaker verification system［EB/OL］. （2018-06-11）［2022-07-23］. . 10.21437/interspeech.2018-92
8	MATĚJKA P， GLEMBEK O， CASTALDO F， et al. Full-covariance UBM and heavy-tailed PLDA in i-vector speaker verification［C］// Proceedings of the 2011 IEEE Conference on Acoustics， Speech and Signal Processing. Piscataway： IEEE， 2011： 4828-4831. 10.1109/icassp.2011.5947436
9	BRÜMMER N， SWART A， MOŠNER L， et al. Probabilistic spherical discriminant analysis： an alternative to PLDA for length-normalized embeddings ［EB/OL］. （2022-03-28）［2022-09-10］. . 10.21437/interspeech.2022-731
10	罗家诚.基于改进信道补偿的I-vector说话人识别［J］.电子设计工程，2021，29（20）：96-100. 10.14022/j.issn1674-6236.2021.20.020
	LUO J C. Improving channel compensation for I-vector based speaker recognition ［J］. Electronic Design Engineering，2021，29（20）：96-100. 10.14022/j.issn1674-6236.2021.20.020
11	QIN X， BU H， LI M. HI-MIA： a far-field text-dependent speaker verification database and the baselines［C］// Proceedings of the 2020 IEEE International Conference on Acoustics， Speech and Signal Processing. Piscataway： IEEE， 2020： 7609-7613. 10.1109/icassp40776.2020.9054423
12	王明合，唐振民，张二华.基于i-vector局部加权线性判别分析的说话人识别［J］. 仪器仪表学报， 2015， 36（12）： 2842-2848. 10.3969/j.issn.0254-3087.2015.12.026
	WANG M H， TANG Z M， ZHANG E H. I-vector based speaker recognition using local weighted linear discriminant analysis［J］. Chinese Journal of Scientific Instrument， 2015， 36（12）： 2842-2848. 10.3969/j.issn.0254-3087.2015.12.026
13	DING K. A note on Kaldi’s PLDA implementation ［EB/OL］. （2018-04-02）［2023-04-15］. . 10.48550/arXiv.1804.00403
14	HE M， RAJ D， HUANG Z， et al. Target-speaker voice activity detection with improved i-vector estimation for unknown number of speaker ［EB/OL］. （2021-08-07）［2022-07-13］. . 10.21437/interspeech.2021-750
15	JONES M C， PEWSEY A. Sinh-arcsinh distributions［J］. Biometrika， 2009， 96（4）： 761-780. 10.1093/biomet/asp053
16	茅正冲，王俊俊，黄舒伟.基于PLDA信道补偿的说话人识别算法［J］. 计算机与数字工程， 2019， 47（11）： 2757-2762. 10.3969/j.issn.1672-9722.2019.11.023
	MAO Z C， WANG J J， HUANG S W. Speaker compensation algorithm based on channel compensation［J］. Computer and Digital Engineering， 2019， 47（11）： 2757-2762. 10.3969/j.issn.1672-9722.2019.11.023
17	SILNOVA A， BRUMMER N， GARCIA-ROMERO D， et al. Fast variational Bayes for heavy-tailed PLDA applied to i-vectors and x-vectors ［EB/OL］. （2018-03-24）［2022-06-15］. . 10.21437/interspeech.2018-2128
18	ZHAO M， MA Y， LIU M， et al. The speakIn system for VoxCeleb speaker recognition challange 2021 ［EB/OL］. （2021-09-05）［2022-04-11］. . 10.48550/arXiv.2109.01989
19	BANERJEE A， DHILLON I S， GHOSH J， et al. Clustering on the unit hypersphere using Von Mises-Fisher distributions［J］. The Journal of Machine Learning Research， 2005， 6： 1345-1382. 10.1007/s10846-005-9023-3
20	LIU Y， LI Z， LI L， et al. Phoneme-aware and channel-wise attentive learning for text dependentspeaker verification ［EB/OL］. （2021-06-25）［2023-04-15］. . 10.21437/interspeech.2021-2137
21	ZHU G， JIANG F， DUAN Z. Y-vector： multiscale waveform encoder for speaker embedding ［EB/OL］. （2021-06-09）［2022-04-25］. . 10.21437/interspeech.2021-1707
22	RAJAN R， KUMAR A V， BABU B P. Poetic meter classification using i-Vector-MTF fusion［C］// Proceedings of the 21st Annual Conference of the International Speech Communication Association. Baixas， France：International Speech Communication Association， 2020： 145-149. 10.21437/interspeech.2020-1794
23	肜娅峰，陈晨，陈德运，等.基于贝叶斯主成分分析的i-vector说话人确认方法［J］.电子学报，2021，49（11）：2186-2194. 10.12263/DZXB.20200476
	RONG Y F， CHEN C， CHEN D Y， et al. Bayesian principal component analysis for i-vector speaker verification［J］. Acta Electronica Sinica， 2021， 49（11）： 2186-2194. 10.12263/DZXB.20200476
24	CHEN Z， LIN Y. Improving x-vector and PLDA for text-dependent speaker verification［C］// Proceedings of the 21st Annual Conference of the International Speech Communication Association. Baixas， France：International Speech Communication Association， 2020： 726-730. 10.21437/interspeech.2020-1188
25	NGUYEN H V， BAI L. Cosine similarity metric learning for face verification［C］// Proceedings of the 10th Asian Conference on Computer Vision. Berlin： Springer， 2011： 709-720. 10.1007/978-3-642-19309-5_55
26	OKABE K， KOSHINAKA T， SHINODA K. Attentive statistics pooling for deep speaker embedding ［EB/OL］. （2019-02-25）［2023-01-15］. . 10.21437/interspeech.2018-993

模型	100维i-vector		200维i-vector		400维i-vector
模型	EER	MinDCF	EER	MinDCF	EER	MinDCF
cos	4.87	0.352	4.28	0.324	4.15	0.321
PLDA	4.85	0.355	4.27	0.332	4.10	0.347
HT-PLDA	4.68	0.339	4.19	0.355	4.11	0.343
PSDA	4.62	0.336	4.17	0.346	4.09	0.352
CC-PSDA	4.23	0.314	3.88	0.286	3.63	0.249

模型	100维i-vector		200维i-vector		400维i-vector
模型	EER	MinDCF	EER	MinDCF	EER	MinDCF
cos	4.87	0.352	4.28	0.324	4.15	0.321
PLDA	4.85	0.355	4.27	0.332	4.10	0.347
HT-PLDA	4.68	0.339	4.19	0.355	4.11	0.343
PSDA	4.62	0.336	4.17	0.346	4.09	0.352
CC-PSDA	4.23	0.314	3.88	0.286	3.63	0.249

模型	100维i-vector		200维i-vector		400维i-vector
模型	EER	MinDCF	EER	MinDCF	EER	MinDCF
cos	6.90	0.468	6.74	0.455	5.96	0.470
PLDA	6.86	0.472	6.71	0.551	5.95	0.488
HT-PLDA	6.51	0.464	6.44	0.452	5.93	0.468
PSDA	6.40	0.466	6.35	0.443	5.84	0.429
CC-PSDA	5.84	0.420	5.67	0.411	5.41	0.397

模型	100维i-vector		200维i-vector		400维i-vector
模型	EER	MinDCF	EER	MinDCF	EER	MinDCF
cos	6.90	0.468	6.74	0.455	5.96	0.470
PLDA	6.86	0.472	6.71	0.551	5.95	0.488
HT-PLDA	6.51	0.464	6.44	0.452	5.93	0.468
PSDA	6.40	0.466	6.35	0.443	5.84	0.429
CC-PSDA	5.84	0.420	5.67	0.411	5.41	0.397

模型	100维i-vector		200维i-vector		400维i-vector
模型	EER	MinDCF	EER	MinDCF	EER	MinDCF
cos	11.24	0.625	10.43	0.615	9.86	0.536
PLDA	10.98	0.611	10.21	0.623	9.79	0.535
HT-PLDA	9.96	0.573	9.62	0.633	8.47	0.520
PSDA	9.87	0.598	9.40	0.572	8.23	0.493
CC-PSDA	8.65	0.511	7.69	0.506	7.14	0.447

基于概率球面判别分析的说话人识别信道补偿算法

Channel compensation algorithm for speaker recognition based on probabilistic spherical discriminant analysis

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 9

参考文献 26

相关文章 15

编辑推荐

Metrics

模型	EER	MinDCF
TDNN+PLDA	8.31	0.513
TDNN+cos	7.13	0.303
TDNN+PSDA	7.05	0.317
TDNN+CC-PSDA	6.87	0.222

迭代次数	VoxCeleb1-O		VoxCeleb1-E		VoxCeleb1-H
迭代次数	变换前	变换后	变换前	变换后	变换前	变换后
初值	100.00	100.00	100.00	100.00	100.00	100.00
1	60.87	80.23	85.91	80.10	80.43	88.42
2	40.91	73.10	70.66	79.21	89.16	79.90
3	39.66	45.87	60.78	66.89	77.94	63.27
4	38.36	30.17	48.13	54.21	67.09	55.81
5	37.11	29.68	50.32	45.90	52.43	49.33
6	37.28	29.80	45.12	41.01	51.88	48.84
7	37.28	29.80	44.65	39.82	50.29	48.63
8	37.28	29.80	44.65	39.77	50.01	48.63
9	37.28	29.80	44.65	39.77	50.01	48.63
10	37.28	29.80	44.65	39.77	50.01	48.63

[1]	方昕, 黄泽鑫, 张聿晗, 高天, 潘嘉, 付中华, 高建清, 刘俊华, 邹亮. 基于时域波形的半监督端到端虚假语音检测方法[J]. 《计算机应用》唯一官方网站, 2023, 43(1): 227-231.
[2]	牛晓可, 黄伊鑫, 徐华兴, 蒋震阳. 基于听皮层神经元感受野的强噪声环境下说话人识别[J]. 计算机应用, 2020, 40(10): 3034-3040.
[3]	孙念, 张毅, 林海波, 黄超. 基于多特征i-vector的短语音说话人识别算法[J]. 计算机应用, 2018, 38(10): 2839-2843.
[4]	马新军, 吴晨晨, 仲乾元, 李园园. 基于SIFT的说话人唇动识别[J]. 计算机应用, 2017, 37(9): 2694-2699.
[5]	黄文娜, 彭亚雄, 贺松. 基于MAP+CMLLR的说话人识别中发声力度问题[J]. 计算机应用, 2017, 37(3): 906-910.
[6]	汪海彬, 余正涛, 毛存礼, 郭剑毅. SMFCC:一种新的语音信号特征提取方法[J]. 计算机应用, 2016, 36(6): 1735-1740.
[7]	谢小娟, 曾以成, 熊冰峰. 说话人识别中基于Fisher比的特征组合方法[J]. 计算机应用, 2016, 36(5): 1421-1425.
[8]	张俊, 关胜晓. 基于改进的最大后验概率矢量量化和最小二乘支持向量机集成算法[J]. 计算机应用, 2015, 35(7): 2101-2104.
[9]	储雯李银国徐洋孟祥涛. 基于段级特征主成分分析的说话人识别算法[J]. 计算机应用, 2013, 33(07): 1935-1937.
[10]	胡峰松张璇. 基于梅尔频率倒谱系数与翻转梅尔频率倒谱系数的说话人识别方法[J]. 计算机应用, 2012, 32(09): 2542-2544.
[11]	何伟徐阳张玲. 基于SOPC的说话人识别算法的实现与优化[J]. 计算机应用, 2012, 32(05): 1463-1466.
[12]	何亮刘加. 基于线性对数似然核函数的说话人识别[J]. 计算机应用, 2011, 31(08): 2083-2086.
[13]	高会贤马全福郑晓势. 短语音噪声环境下说话人识别特征提取[J]. 计算机应用, 2010, 30(10): 2712-2714.
[14]	亢明汪成亮陈娟娟. 基于动态阈值失量量化的说话人识别[J]. 计算机应用, 2009, 29(1): 146-148.
[15]	许允喜陈方. 基于混合微粒群算法的说话人识别[J]. 计算机应用, 2008, 28(6): 1546-1548.