Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (12): 3727-3732.DOI: 10.11772/j.issn.1001-9081.2022121902

• Artificial intelligence • Previous Articles     Next Articles

Text-independent speaker verification method based on uncertainty learning

Yulian ZHANG, Shanshan YAO(), Chao WANG, Jiang CHANG   

  1. Institute of Big Data Science and Industry,Shanxi University,Taiyuan Shanxi 030006,China
  • Received:2022-12-29 Revised:2023-03-07 Accepted:2023-03-08 Online:2023-03-17 Published:2023-12-10
  • Contact: Shanshan YAO
  • About author:ZHANG Yulian, born in 1997, M. S. candidate. Her research interests include voiceprint recognition.
    WANG Chao, born in 1995, M. S. candidate. His research interests include voiceprint recognition.
    CHANG Jiang, born in 1988, Ph. D., lecturer. Her research interests include speech sentiment analysis.
  • Supported by:
    National Natural Science Foundation of China(61906115);Shanxi Province Science Foundation for Youths(20210302124556)

基于不确定性学习的文本无关的说话人确认方法

张玉莲, 姚姗姗(), 王超, 畅江   

  1. 山西大学 大数据科学与产业研究院,太原 030006
  • 通讯作者: 姚姗姗
  • 作者简介:张玉莲(1997—),女,山西晋城人,硕士研究生,主要研究方向:声纹识别
    王超(1995—),男,山西大同人,硕士研究生,主要研究方向:声纹识别
    畅江(1988—),女,山西运城人,讲师,博士,主要研究方向:语音情感分析。
  • 基金资助:
    国家自然科学基金资助项目(61906115);山西省青年科学基金资助项目(20210302124556)

Abstract:

The speaker verification task aims to determine whether a registration speech and a test speech belong to the same speaker. A Text-Independent Speaker Verification (TISV) method based on Uncertainty Learning (UL) was proposed to address the problem that the voiceprint features extracted by speaker recognition systems are usually disturbed by factors unrelated to identity information, thereby leading to serious degradation of the system accuracy. Firstly, uncertainty was introduced in the speaker backbone network to simultaneously learn the voiceprint features (mean) and the uncertainty of the speech data (variance), so as to model the uncertainty in the speech dataset. Then, the distribution representation of the features was obtained by a resampling technique. Finally, the degradation problem in the calculation process of classification loss was solved by constraining the distribution of the noise through the introduction of KL (Kullback-Leibler) divergence regularization into the speaker classification loss. Experimental results show that after training on VoxCeleb1 and VoxCeleb2 development sets and testing on VoxCeleb1-O test set, compared with the certainty method-based model Thin ResNet34, the model of the proposed method has the Equal Error Rate (EER) reduced by 9.9% and 10.4% respectively, and minimum Detection Cost Function (minDCF) reduced by 10.9% and 4.5% respectively. It can be seen that the accuracy of the proposed method is improved in noisy and unconstrained scenarios.

Key words: speaker verification, data uncertainty, distribution embedding, AAM-softmax (Additive Angular Margin-softmax), KL (Kullback-Leibler) divergence

摘要:

说话人确认任务旨在判断注册语音与测试语音是否属于同一说话人。针对说话人识别系统提取的声纹特征通常会受到与身份信息无关的因素干扰进而导致系统的准确性严重下降的问题,提出一种基于不确定性学习(UL)的文本无关的说话人确认(TISV)方法。首先,在说话人主干网络中引入不确定性同时学习声纹特征(均值)和话语数据的不确定性(方差),以建模语音数据集中的不确定性;其次,通过重采样技巧得到特征的分布表示;最后,在说话人分类损失中引入KL散度正则化约束噪声的分布,从而解决计算分类损失过程中的退化问题。实验结果表明,当训练集为VoxCeleb1和VoxCeleb2开发集时,与基于确定性方法的Thin ResNet34模型相比,所提方法的模型在VoxCeleb1-O测试集上的等错误率(EER)分别降低了9.9%和10.4%,最小检测代价函数(minDCF)分别降低了10.9%和4.5%。可见,所提方法在有噪声、无约束场景下的准确度有所提高。

关键词: 说话人确认, 数据不确定性, 分布嵌入, AAM-softmax, KL散度

CLC Number: