基于MAP+CMLLR的说话人识别中发声力度问题

doi:10.11772/j.issn.1001-9081.2017.03.906

计算机应用 ›› 2017, Vol. 37 ›› Issue (3): 906-910.DOI: 10.11772/j.issn.1001-9081.2017.03.906

• 应用前沿、交叉与综合 • 上一篇下一篇

基于MAP+CMLLR的说话人识别中发声力度问题

黄文娜, 彭亚雄, 贺松

贵州大学大数据与信息工程学院, 贵阳 550025

收稿日期:2016-07-22 修回日期:2016-09-17 出版日期:2017-03-10 发布日期:2017-03-22
通讯作者: 彭亚雄
作者简介:黄文娜(1990-),女,贵州赤水人,硕士研究生,主要研究方向:说话人识别;彭亚雄(1963-),男,贵州遵义人,副教授,主要研究方向:信号处理;贺松(1970-),男,贵州贵阳人,副教授,硕士,主要研究方向:信号处理。
基金资助:
贵州省社会攻关计划项目（黔科合SY字[2013]3105号）；贵州省工程技术研究中心建设项目（黔科合G字[2014]4002号）。

Vocal effort in speaker recognition based on MAP+CMLLR

HUANG Wenna, PENG Yaxiong, HE Song

College of Big Data and Information Engineering, Guizhou University, Guiyang Guizhou 550025, China

Received:2016-07-22 Revised:2016-09-17 Online:2017-03-10 Published:2017-03-22
Supported by:
This work is partially supported by the Social Research Plan of Guizhou Province (20133015),the Engineering Technology Research Center Construction Project of Guizhou Province (20144002).

摘要/Abstract

摘要： 为了改善发声力度对说话人识别系统性能的影响，在训练语音存在少量耳语、高喊语音数据的前提下，提出了使用最大后验概率（MAP）和约束最大似然线性回归（CMLLR）相结合的方法来更新说话人模型、投影转换说话人特征。其中，MAP自适应方法用于对正常语音训练的说话人模型进行更新，而CMLLR特征空间投影方法则用来投影转换耳语、高喊测试语音的特征，从而改善训练语音与测试语音的失配问题。实验结果显示，采用MAP+CMLLR方法时，说话人识别系统等错误率（EER）明显降低，与基线系统、最大后验概率（MAP）自适应方法、最大似然线性回归（MLLR）模型投影方法和约束最大似然线性回归（CMLLR）特征空间投影方法相比，MAP+CMLLR方法的平均等错率分别降低了75.3%、3.5%、72%和70.9%。实验结果表明，所提出方法削弱了发声力度对说话人区分性的影响，使说话人识别系统对于发声力度变化更加鲁棒。

关键词: 说话人识别, 发声力度, 最大后验概率, 最大似然线性回归, 约束最大似然线性回归

Abstract: To improve the performance of recognition system which is influenced by the change of vocal effort, in the premise of a small amount of whisper and shouted speech data in training speech data, Maximum A Posteriori (MAP) and Constraint Maximum Likelihood Linear Regression (CMLLR) were combined to update the speaker model and transform the speaker characteristics. MAP adaption method was used to update the speaker model of normal speech training, and the CMLLR feature space projection method was used to project and transform the features of whisper and shouted testing speech to improve the mismatch between training speech and testing speech. Experimental results show that the Equal Error Rate (EER) of speaker recognition system was significantly reduced by using the proposed method. Compared with the baseline system, MAP adaptation method, Maximum Likelihood Linear Regression (MLLR) model projection method and CMLLR feature space projection method, the average EER is reduced by 75.3%, 3.5%, 72%, 70.9%, respectively. The experimental results prove that the proposed method weakens the influence on discriminative power for vocal effort and makes the speaker recognition system more robust to vocal effort variability.

Key words: speaker recognition, vocal effort, Maximum A Posteriori (MAP), Maximum Likelihood Linear Regression (MLLR), Constraint Maximum Likelihood Linear Regression (CMLLR)

中图分类号:

TP391.4

黄文娜, 彭亚雄, 贺松. 基于MAP+CMLLR的说话人识别中发声力度问题[J]. 计算机应用, 2017, 37(3): 906-910.

HUANG Wenna, PENG Yaxiong, HE Song. Vocal effort in speaker recognition based on MAP+CMLLR[J]. Journal of Computer Applications, 2017, 37(3): 906-910.

参考文献

[1] TRAUNMÜLLER H, ERIKSSON A. Acoustic effects of variation in vocal effort by men, women, and children[J]. The Journal of the Acoustical Society of America, 2000, 107(6):3438-3451.
[2] 黄庭.情感说话人识别中的基频失配及其补偿方法研究[D].杭州:浙江大学,2011:136-139. (HUANG T. Research on pitch mismatch and its compensation methods in emotional speaker recognition[D]. Hangzhou:Zhejiang University, 2011:136-139.)
[3] BRUNGART D S, SCOTT K R, SIMPSON B D. The influence of vocal effort on human speaker identification[C]//INTERSPEECH 2001:Proceedings of the 7th European Conference on Speech Communication and Technology, 2nd INTERSPEECH Event.[S. l.]:ISCA, 2001:747-750.
[4] 晁浩,宋成,彭维平.基于发音特征的声效相关鲁棒语音识别算法[J].计算机应用,2015,35(1):257-261. (CHAO H, SONG C, PENG W P. Robust speech recognition algorithm based on articulatory features for vocal effort variability[J]. Journal of Computer Applications, 2015, 35(1):257-261.)
[5] ZHANG C, HANSEN J H L. Analysis and classification of speech mode:whispered through shouted[C]//INTERSPEECH 2007:Proceedings of the 8th Annual Conference of the International Speech Communication Association.[S. l.]:ISCA, 2007:2289-2292.
[6] FAN X, HANSEN J H L. Acoustic analysis and feature transformation from neutral to whisper for speaker identification within whispered speech audio streams[J]. Speech Communication, 2013, 55(1):119-134.
[7] HANILÇI C, KINNUNEN T, SAEIDI R, et al. Speaker identification from shouted speech:analysis and compensation[C]//ICASSP 2013:Proceedings of the 2013 IEEE International Conference on Acoustics, Speech, and Signal Processing. Piscataway, NJ:IEEE, 2013:8027-8031.
[8] POHIALAINEN J, HANILCI C, KINNUNEN T, et al. Mixture linear prediction in speaker verification under vocal effort mismatch[J]. IEEE Signal Processing Letters, 2014, 21(12):1516-1520
[9] 熊子瑜. Praat语音软件使用手册[EB/OL].[2016-09-09]. http://www.doc88.com/p-943562730984.html. (XIONG Z Y. The manual of praat speech software[EB/OL].[2016-09-09]. http://www.doc88.com/p-943562730984.html.)
[10] THOMAS I B. Perceived pitch of whispered vowels[J]. The Journal of the Acoustical Society of America, 1969, 46(2B):468-470.
[11] 王琰蕾.基于JFA的汉语耳语音说话人识别[D].苏州:苏州大学,2010:25-28. (WANG Y L. Speaker identification in Chinese whispered speech based on simplified joint factor analysis[D]. Suzhou:Soochow University, 2010:25-28.)
[12] LEE C-H, LIN C-H, JUANG B-H. A study on speaker adaptation of the parameters of continuous density hidden Markov models[J]. IEEE Transactions on Signal Processing, 1991, 39(4):806-814.
[13] LEGGETTER C J, WOODLAND P C. Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models[J]. Computer Speech & Language, 1995, 9(2):171-185.
[14] GALES M J F, WOODLAND P C. Mean and variance adaptation within the MLLR framework[J]. Computer Speech & Language, 1996, 10(4):249-264.
[15] GALES M J F. Maximum likelihood linear transformations for HMM-based speech recognition[J]. Computer Speech & Language, 1998, 12(2):75-98.

基于MAP+CMLLR的说话人识别中发声力度问题

Vocal effort in speaker recognition based on MAP+CMLLR

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	牛晓可, 黄伊鑫, 徐华兴, 蒋震阳. 基于听皮层神经元感受野的强噪声环境下说话人识别[J]. 计算机应用, 2020, 40(10): 3034-3040.
[2]	孙念, 张毅, 林海波, 黄超. 基于多特征i-vector的短语音说话人识别算法[J]. 计算机应用, 2018, 38(10): 2839-2843.
[3]	马新军, 吴晨晨, 仲乾元, 李园园. 基于SIFT的说话人唇动识别[J]. 计算机应用, 2017, 37(9): 2694-2699.
[4]	丁建立, 韩宇超, 王家亮. 基于粗精二次估计的RFID标签数目估算方法[J]. 计算机应用, 2017, 37(9): 2722-2727.
[5]	汪海彬, 余正涛, 毛存礼, 郭剑毅. SMFCC:一种新的语音信号特征提取方法[J]. 计算机应用, 2016, 36(6): 1735-1740.
[6]	谢小娟, 曾以成, 熊冰峰. 说话人识别中基于Fisher比的特征组合方法[J]. 计算机应用, 2016, 36(5): 1421-1425.
[7]	张俊, 关胜晓. 基于改进的最大后验概率矢量量化和最小二乘支持向量机集成算法[J]. 计算机应用, 2015, 35(7): 2101-2104.
[8]	齐耀辉潘复平葛凤培颜永红. 鉴别性最大后验概率声学模型自适应[J]. 计算机应用, 2014, 34(1): 265-269.
[9]	储雯李银国徐洋孟祥涛. 基于段级特征主成分分析的说话人识别算法[J]. 计算机应用, 2013, 33(07): 1935-1937.
[10]	胡峰松张璇. 基于梅尔频率倒谱系数与翻转梅尔频率倒谱系数的说话人识别方法[J]. 计算机应用, 2012, 32(09): 2542-2544.
[11]	何伟徐阳张玲. 基于SOPC的说话人识别算法的实现与优化[J]. 计算机应用, 2012, 32(05): 1463-1466.
[12]	何亮刘加. 基于线性对数似然核函数的说话人识别[J]. 计算机应用, 2011, 31(08): 2083-2086.
[13]	张洪艳沈焕锋张良培李平湘袁强强. 基于最大后验估计的影像盲超分辨率重建方法[J]. 计算机应用, 2011, 31(05): 1209-1213.
[14]	高会贤马全福郑晓势. 短语音噪声环境下说话人识别特征提取[J]. 计算机应用, 2010, 30(10): 2712-2714.
[15]	王益艳. 基于广义变分模型的自适应图像去噪算法[J]. 计算机应用, 2009, 29(11): 3033-3036.