计算机应用 ›› 2017, Vol. 37 ›› Issue (3): 906-910.DOI: 10.11772/j.issn.1001-9081.2017.03.906

• 应用前沿、交叉与综合 • 上一篇    下一篇

基于MAP+CMLLR的说话人识别中发声力度问题

黄文娜, 彭亚雄, 贺松   

  1. 贵州大学 大数据与信息工程学院, 贵阳 550025
  • 收稿日期:2016-07-22 修回日期:2016-09-17 出版日期:2017-03-10 发布日期:2017-03-22
  • 通讯作者: 彭亚雄
  • 作者简介:黄文娜(1990-),女,贵州赤水人,硕士研究生,主要研究方向:说话人识别;彭亚雄(1963-),男,贵州遵义人,副教授,主要研究方向:信号处理;贺松(1970-),男,贵州贵阳人,副教授,硕士,主要研究方向:信号处理。
  • 基金资助:
    贵州省社会攻关计划项目(黔科合SY字[2013]3105号);贵州省工程技术研究中心建设项目(黔科合G字[2014]4002号)。

Vocal effort in speaker recognition based on MAP+CMLLR

HUANG Wenna, PENG Yaxiong, HE Song   

  1. College of Big Data and Information Engineering, Guizhou University, Guiyang Guizhou 550025, China
  • Received:2016-07-22 Revised:2016-09-17 Online:2017-03-10 Published:2017-03-22
  • Supported by:
    This work is partially supported by the Social Research Plan of Guizhou Province (20133015),the Engineering Technology Research Center Construction Project of Guizhou Province (20144002).

摘要: 为了改善发声力度对说话人识别系统性能的影响,在训练语音存在少量耳语、高喊语音数据的前提下,提出了使用最大后验概率(MAP)和约束最大似然线性回归(CMLLR)相结合的方法来更新说话人模型、投影转换说话人特征。其中,MAP自适应方法用于对正常语音训练的说话人模型进行更新,而CMLLR特征空间投影方法则用来投影转换耳语、高喊测试语音的特征,从而改善训练语音与测试语音的失配问题。实验结果显示,采用MAP+CMLLR方法时,说话人识别系统等错误率(EER)明显降低,与基线系统、最大后验概率(MAP)自适应方法、最大似然线性回归(MLLR)模型投影方法和约束最大似然线性回归(CMLLR)特征空间投影方法相比,MAP+CMLLR方法的平均等错率分别降低了75.3%、3.5%、72%和70.9%。实验结果表明,所提出方法削弱了发声力度对说话人区分性的影响,使说话人识别系统对于发声力度变化更加鲁棒。

关键词: 说话人识别, 发声力度, 最大后验概率, 最大似然线性回归, 约束最大似然线性回归

Abstract: To improve the performance of recognition system which is influenced by the change of vocal effort, in the premise of a small amount of whisper and shouted speech data in training speech data, Maximum A Posteriori (MAP) and Constraint Maximum Likelihood Linear Regression (CMLLR) were combined to update the speaker model and transform the speaker characteristics. MAP adaption method was used to update the speaker model of normal speech training, and the CMLLR feature space projection method was used to project and transform the features of whisper and shouted testing speech to improve the mismatch between training speech and testing speech. Experimental results show that the Equal Error Rate (EER) of speaker recognition system was significantly reduced by using the proposed method. Compared with the baseline system, MAP adaptation method, Maximum Likelihood Linear Regression (MLLR) model projection method and CMLLR feature space projection method, the average EER is reduced by 75.3%, 3.5%, 72%, 70.9%, respectively. The experimental results prove that the proposed method weakens the influence on discriminative power for vocal effort and makes the speaker recognition system more robust to vocal effort variability.

Key words: speaker recognition, vocal effort, Maximum A Posteriori (MAP), Maximum Likelihood Linear Regression (MLLR), Constraint Maximum Likelihood Linear Regression (CMLLR)

中图分类号: