《计算机应用》唯一官方网站

• •    下一篇

利用全局-局部特征依赖的反欺骗说话人验证方法

张嘉琳1,任庆桦1,毛启容2   

  1. 1. 江苏大学
    2. 江苏大学 计算机科学与通信工程学院
  • 收稿日期:2024-01-08 修回日期:2024-02-27 发布日期:2024-04-01 出版日期:2024-04-01
  • 通讯作者: 毛启容
  • 基金资助:
    支持方言和情感的复杂环境智能语音交互关键技术研发

Anti-spoofing speaker verification method utilizing global-local feature dependency

  • Received:2024-01-08 Revised:2024-02-27 Online:2024-04-01 Published:2024-04-01
  • Contact: Rong QiMAO

摘要: 摘 要: 针对现有卷积模型为主的反欺骗说话人验证方法捕获全局特征依赖不理想的问题,提出了一种利用全局-局部特征依赖的反欺骗说话人验证方法。首先,对于欺骗语音检测模块,设计两种滤波器组合方式对原始语音进行滤波,通过对频率子带的掩蔽实现样本扩充;其次,提出多维全局注意力机制,通过对信道维度、频率维度和时间维度分别进行池化,获得每个维度的全局依赖关系,将全局信息通过加权的方式与原始特征融合。最后,对于说话人验证部分,引入统计金字塔池化时延神经网络(SPD-TDNN),在获取多尺度时频特征的同时,计算特征的标准差,加入全局信息。结果表明,与集成时频图卷积模型(AASIST)相比,提出的欺骗语音检测方法在ASVspoof2019数据集上将等错误率降低了53%。与单独的金字塔池化说话人验证方法相比,提出的反欺骗说话人验证方法将等错误率降低了23个百分点。验证了所提方法借助全局特征依赖能够实现更好的分类效果。

关键词: 关键词: 说话人验证, 数据增强, 频率掩蔽, 注意力机制, 欺骗语音检测

Abstract: Abstract: Aiming at the problem that the convolutional model-based anti-spoofing speaker verification method cannot capture global feature dependency well, an anti-spoofing speaker verification method utilizing global-local feature dependency was proposed. First, for the spoofing speech detection module, two filter combinations were designed to filter the original speech, and sample augmentation was achieved by masking the frequency sub-bands. Second, a multidimensional global attention mechanism was proposed, where the global dependencies of each dimension were obtained by pooling the channel dimension, frequency dimension, and time dimension, respectively, and the global information was fused with the original features by weighting. Finally, for the speaker verification part, a Statistical Pyramid Dense Time Delay Neural Network (SPD-TDNN) was introduced to compute the standard deviation of the features and join the global information while obtaining the multi-scale time-frequency features. The results show that the proposed spoofing speech detection method reduces the equal error rate by 53% on the ASVspoof2019 dataset compared to the Audio Anti-Spoofing using Integrated Spectro-Temporal graph attention network (AASIST). The proposed anti-spoofing speaker verification method reduces the equal error rate by 23 percentage points compared to the statistical pyramid dense time delay neural network method. It has been verified that the proposed method achieves better classification results with the help of global feature dependency.

Key words: Keywords: speaker verification, data augmentation, frequency masking, attention mechanism, synthetic speech detection

中图分类号: