基于决策边界优化域自适应的跨库语音情感识别

doi:10.11772/j.issn.1001-9081.2021122043

《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (2): 374-379.DOI: 10.11772/j.issn.1001-9081.2021122043

所属专题：人工智能

基于决策边界优化域自适应的跨库语音情感识别

汪洋¹, 傅洪亮¹, 陶华伟¹(), 杨静¹, 谢跃², 赵力³

^1.粮食信息处理与控制教育部重点实验室(河南工业大学), 郑州 450001
^2.南京工程学院信息与通信工程学院, 南京 211167
^3.东南大学信息科学与工程学院, 南京 210096

收稿日期:2021-12-06 修回日期:2022-04-27 接受日期:2022-05-11 发布日期:2022-06-13 出版日期:2023-02-10
通讯作者: 陶华伟
作者简介:汪洋（1999—），男，河南信阳人，硕士研究生，CCF会员，主要研究方向：语音信号处理
傅洪亮（1965—），男，河南安阳人，教授，博士，主要研究方向：通信与信息系统
杨静（1983—），女，河南商丘人，副教授，博士，主要研究方向：通信信号处理
谢跃（1991—），男，江苏淮安人，博士，主要研究方向：人工智能、情感计算
赵力（1958—），男，江苏南京人，教授，博士，主要研究方向：语音信号处理、情感信息处理。
基金资助:
国家自然科学基金资助项目(62001215);河南省教育厅自然科学项目(21A120003);河南工业大学高层次人才启动项目(2018BS037)

Cross-corpus speech emotion recognition based on decision boundary optimized domain adaptation

Yang WANG¹, Hongliang FU¹, Huawei TAO¹(), Jing YANG¹, Yue XIE², Li ZHAO³

^1.Key Laboratory of Grain Information Processing and Control，Ministry of Education （Henan University of Technology），Zhengzhou Henan 450001，China
^2.School of Information and Communication Engineering，Nanjing Institute of Technology，Nanjing Jiangsu 211167，China
^3.School of Information Science and Engineering，Southeast University，Nanjing Jiangsu 210096，China

Received:2021-12-06 Revised:2022-04-27 Accepted:2022-05-11 Online:2022-06-13 Published:2023-02-10
Contact: Huawei TAO
About author:WANG Yang， born in 1999， M. S. candidate. His research interests include speech signal processing.
FU Hongliang， born in 1965， Ph. D.， professor. His research interests include communication and information systems.
YANG Jing， born in 1983， Ph. D.， associate professor. Her research interests include communication signal processing.
XIE Yue， born in 1991， Ph. D.， lecturer. His research interests include artificial intelligence， affective computing.
ZHAO Li， born in 1958， Ph. D.， professor. His research interests include speech signal processing， affective information processing.
Supported by:
National Natural Science Foundation of China(62001215);Natural Science Project of Education Department of Henan Province(21A120003);Start-up Fund for High-level Talents of Henan University of Technology(2018BS037)

摘要/Abstract

摘要：

域自适应算法被广泛应用于跨库语音情感识别中；然而，许多域自适应算法在追求减小域差异的同时，丧失了目标域样本的鉴别性，导致其以高密度的形式存在于模型决策边界处，降低了模型的性能。基于此，提出一种基于决策边界优化域自适应（DBODA）的跨库语音情感识别方法。首先利用卷积神经网络进行特征处理，随后将特征送入最大化核范数及均值差异（MNMD）模块，在减小域间差异的同时，最大化目标域情感预测概率矩阵的核范数，从而提升目标域样本的鉴别性并优化决策边界。在以Berlin、eNTERFACE和CASIA语音库为基准库设立的六组跨库实验中，所提方法的平均识别精度领先于其他算法1.68~11.01个百分点，说明所提模型有效降低了决策边界的样本密度，提升了预测的准确性。

关键词: 跨库语音情感识别, 卷积神经网络, 决策边界优化, 域自适应, 特征分布差异

Abstract:

Domain adaptation algorithms are widely used for cross-corpus speech emotion recognition. However， many domain adaptation algorithms lose the discrimination of target domain samples while pursuing the minimization of domain discrepancy， resulting in their presence at the decision boundary of the model in a high-density form， which degrades the performance of the model. Based on the above problem， a Decision Boundary Optimized Domain Adaptation （DBODA） method based cross-corpus speech emotion recognition was proposed. Firstly， the features were processed by using convolutional neural networks. Then， the features were fed into the Maximum Nuclear-norm and Mean Discrepancy （MNMD） module to maximize the nuclear norm of the sentiment prediction probability matrix of the target domain while reducing the inter-domain discrepancy， thereby enhancing the discrimination of the target domain samples and optimize the decision boundary. In six sets of cross-corpus experiments set up on the basis of Berlin， eNTERFACE and CASIA speech databases， the average recognition accuracy of the proposed method is 1.68 to 11.01 percentage points ahead of those of the other algorithms， indicating that the proposed model effectively reduces the sample density around the decision boundary and improves the prediction accuracy.

Key words: cross-corpus speech emotion recognition, convolutional neural network, decision boundary optimization, domain adaptation, feature distribution discrepancy

中图分类号:

TP391.4

汪洋, 傅洪亮, 陶华伟, 杨静, 谢跃, 赵力. 基于决策边界优化域自适应的跨库语音情感识别[J]. 计算机应用, 2023, 43(2): 374-379.

Yang WANG, Hongliang FU, Huawei TAO, Jing YANG, Yue XIE, Li ZHAO. Cross-corpus speech emotion recognition based on decision boundary optimized domain adaptation[J]. Journal of Computer Applications, 2023, 43(2): 374-379.

图/表 10

参考文献 22

1	李海峰，陈婧，马琳，等. 维度语音情感识别研究综述［J］. 软件学报， 2020， 31（8）： 2465-2491. 10.13328/j.cnki.jos.006078
	LI H F， CHEN J， MA L， et al. Dimensional speech emotion recognition review［J］. Journal of Software， 2020， 31（8）： 2465-2491. 10.13328/j.cnki.jos.006078
2	LUO H， HAN J Q. Nonnegative matrix factorization based transfer subspace learning for cross-corpus speech emotion recognition［J］. IEEE/ACM Transactions on Audio， Speech， and Language Processing， 2020， 28： 2047-2060. 10.1109/taslp.2020.3006331
3	ZHANG W J， SONG P. Transfer sparse discriminant subspace learning for cross-corpus speech emotion recognition［J］. IEEE/ACM Transactions on Audio， Speech， and Language Processing， 2020， 28： 307-318. 10.1109/taslp.2019.2955252
4	LUO H， HAN J Q. Cross-corpus speech emotion recognition using semi-supervised transfer non-negative matrix factorization with adaptation regularization［C］// Proceedings of the Interspeech 2019. ［S.l.］： International Speech Communication Association， 2019： 3247-3251. 10.21437/interspeech.2019-2041
5	ZHANG J C， JIANG L， ZONG Y， et al. Cross-corpus speech emotion recognition using joint distribution adaptive regression［C］// Proceedings of the 2021 IEEE International Conference on Acoustics， Speech and Signal Processing. Piscataway： IEEE， 2021： 3790-3794. 10.1109/icassp39728.2021.9414372
6	DENG J， XU X Z， ZHANG Z X， et al. Semisupervised autoencoders for speech emotion recognition［J］. IEEE/ACM Transactions on Audio， Speech， and Language Processing， 2018， 26（1）： 31-43. 10.1109/taslp.2017.2759338
7	GIDEON J， McINNIS M G， PROVOST E M. Improving cross-corpus speech emotion recognition with Adversarial Discriminative Domain Generalization （ADDoG）［J］. IEEE Transactions on Affective Computing， 2021， 12（4）： 1055-1068. 10.1109/TAFFC.2019.2916092
8	LEE S W. Domain generalization with triplet network for cross-corpus speech emotion recognition［C］// Proceedings of the 2021 IEEE Spoken Language Technology Workshop. Piscataway： IEEE， 2021： 389-396. 10.1109/slt48900.2021.9383534
9	ABDELWAHAB M， BUSSO C. Domain adversarial for acoustic emotion recognition［J］. IEEE/ACM Transactions on Audio， Speech， and Language Processing， 2018， 26（12）： 2423-2435. 10.1109/taslp.2018.2867099
10	LIU J T， ZHENG W M， ZONG Y， et al. Cross-corpus speech emotion recognition based on deep domain-adaptive convolutional neural network［J］. IEICE Transactions on Information and Systems， 2020， E-103（2）： 459-463. 10.1587/transinf.2019edl8136
11	MUSTAQEEM， KWON S. MLT-DNet： speech emotion recognition using 1D dilated CNN based on multi-learning trick approach［J］. Expert Systems with Applications， 2021， 167： No.114177. 10.1016/j.eswa.2020.114177
12	ZHAO J F， MAO X， CHEN L J. Speech emotion recognition using deep 1D & 2D CNN LSTM networks［J］. Biomedical Signal Processing and Control， 2019， 47： 312-323. 10.1016/j.bspc.2018.08.035
13	SONG P， ZHENG W M， OU S F， et al. Cross-corpus speech emotion recognition based on transfer non-negative matrix factorization［J］. Speech Communication， 2016， 83： 34-41. 10.1016/j.specom.2016.07.010
14	WANG W， LI H J， DING Z M， et al. Rethinking maximum mean discrepancy for visual domain adaptation［J］. IEEE Transactions on Neural Networks and Learning Systems， 2021（Early Access）： 1-14. 10.1109/tnnls.2021.3093468
15	CUI S H， WANG S H， ZHUO J B， et al. Towards discriminability and diversity： batch nuclear-norm maximization under label insufficient situations［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 3940-3949. 10.1109/cvpr42600.2020.00400
16	RECHT B， FAZEL M， PARRILO P A. Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization［J］. SIAM Review， 2010， 52（3）： 471-501. 10.1137/070697835
17	BURKHARDT F， PAESCHKE A， ROLFES M， et al. A database of German emotional speech［C］// Proceedings of the Interspeech 2005. ［S.l.］： International Speech Communication Association， 2005： 1517-1520. 10.21437/interspeech.2005-446
18	MARTIN O， KOTSIA I， MACQ B， et al. The eNTERFACE’05 audio-visual emotion database［C］// Proceedings of the 22nd International Conference on Data Engineering Workshops. Piscataway： IEEE， 2006： 8-8. 10.1109/icdew.2006.145
19	TAO J H， LIU F Z， ZHANG M， et al. Design of speech corpus for mandarin text to speech［C/OL］// Proceedings of the Blizzard Challenge 2008 Workshop. ［2021-09-21］..
20	SCHULLER B， STEIDL S， BATLINER A， et al. The INTERSPEECH 2010 paralinguistic challenge［C］// Proceedings of the Interspeech 2010. ［S.l.］： International Speech Communication Association， 2010： 2794-2797. 10.21437/interspeech.2010-739
21	EYBEN F， WÖLLMER M， SCHULLER B. openSMILE： the Munich versatile and fast open-source audio feature extractor［C］// Proceedings of the 18th ACM International Conference on Multimedia. New York： ACM， 2010： 1459-1462. 10.1145/1873951.1874246
22	庄志豪，傅洪亮，陶华伟，等. 基于深度自编码器子域自适应的跨库语音情感识别［J］. 计算机应用研究， 2021， 38（11）： 3279-3282， 3348.
	ZHUANG Z H， FU H L， TAO H W， et al. Cross-corpus speech emotion recognition based on deep autoencoder subdomain adaptation［J］. Application Research of Computers， 2021， 38（11）： 3279-3282， 3348.

网络层	卷积核n×k×s	输出尺寸b×n×f
Conv1D	16×9×2	16×16×791
Conv1D	32×9×2	16×32×396
Conv1D	64×9×2	16×64×198
Conv1D	128×9×2	16×128×99
展平	—	16×12 672
全连接层	—	16×2 048
全连接层	—	16×5
softmax	分类器	16×5

网络层	卷积核n×k×s	输出尺寸b×n×f
Conv1D	16×9×2	16×16×791
Conv1D	32×9×2	16×32×396
Conv1D	64×9×2	16×64×198
Conv1D	128×9×2	16×128×99
展平	—	16×12 672
全连接层	—	16×2 048
全连接层	—	16×5
softmax	分类器	16×5

源域	目标域	共有情感类型
eNTERFACE（e）	Berlin（B）	愤怒、厌恶、恐惧、快乐、悲伤
Berlin（B）	eNTERFACE（e）	愤怒、厌恶、恐惧、快乐、悲伤
CASIA（C）	eNTERFACE（e）	愤怒、恐惧、快乐、悲伤、惊讶
eNTERFACE（e）	CASIA（C）	愤怒、恐惧、快乐、悲伤、惊讶
CASIA（C）	Berlin（B）	愤怒、恐惧、快乐、中立、悲伤
Berlin（B）	CASIA（C）	愤怒、恐惧、快乐、中立、悲伤

源域	目标域	共有情感类型
eNTERFACE（e）	Berlin（B）	愤怒、厌恶、恐惧、快乐、悲伤
Berlin（B）	eNTERFACE（e）	愤怒、厌恶、恐惧、快乐、悲伤
CASIA（C）	eNTERFACE（e）	愤怒、恐惧、快乐、悲伤、惊讶
eNTERFACE（e）	CASIA（C）	愤怒、恐惧、快乐、悲伤、惊讶
CASIA（C）	Berlin（B）	愤怒、恐惧、快乐、中立、悲伤
Berlin（B）	CASIA（C）	愤怒、恐惧、快乐、中立、悲伤

任务	O-CNN	CNN+MMD	DBODA
平均	38.84	39.97	44.26
e2B	47.55	48.13	52.99
B2e	37.34	38.93	41.33
C2e	34.72	35.56	36.67
e2C	34.93	35.95	37.29
C2B	50.71	52.26	54.25
B2C	38.64	39.28	43.04

基于决策边界优化域自适应的跨库语音情感识别

Cross-corpus speech emotion recognition based on decision boundary optimized domain adaptation

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 10

参考文献 22

相关文章 15

编辑推荐

Metrics

任务	SVM	TSDSL^［3］	JDAR^［5］	DBODA
平均	33.25	40.46	38.97	44.26
e2B	32.00	47.41	48.74	52.99
B2e	32.47	35.44	38.14	41.33
C2e	25.69	33.25	28.43	36.67
e2C	27.40	32.50	30.30	37.29
C2B	44.12	56.74	49.58	54.25
B2C	37.80	37.40	38.60	43.04

任务	DANN^［9］	DDACNN^［10］	DASA^［22］	DBODA
平均	42.58	38.78	42.25	44.26
e2B	52.67	49.93	52.35	52.99
B2e	36.53	34.51	40.11	41.33
C2e	29.17	31.59	32.09	36.67
e2C	36.60	31.90	36.10	37.29
C2B	57.64	46.62	51.47	54.25
B2C	42.89	38.10	41.40	43.04

[1]	李云, 王富铕, 井佩光, 王粟, 肖澳. 基于不确定度感知的帧关联短视频事件检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2903-2910.
[2]	秦璟, 秦志光, 李发礼, 彭悦恒. 基于概率稀疏自注意力神经网络的重性抑郁疾患诊断[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2970-2974.
[3]	陈虹, 齐兵, 金海波, 武聪, 张立昂. 融合1D-CNN与BiGRU的类不平衡流量异常检测[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2493-2499.
[4]	赵宇博, 张丽萍, 闫盛, 侯敏, 高茂. 基于改进分段卷积神经网络和知识蒸馏的学科知识实体间关系抽取[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2421-2429.
[5]	张春雪, 仇丽青, 孙承爱, 荆彩霞. 基于两阶段动态兴趣识别的购买行为预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2365-2371.
[6]	高阳峄, 雷涛, 杜晓刚, 李岁永, 王营博, 闵重丹. 基于像素距离图和四维动态卷积网络的密集人群计数与定位方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2233-2242.
[7]	王东炜, 刘柏辰, 韩志, 王艳美, 唐延东. 基于低秩分解和向量量化的深度网络压缩方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 1987-1994.
[8]	沈君凤, 周星辰, 汤灿. 基于改进的提示学习方法的双通道情感分析模型[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1796-1806.
[9]	黄梦源, 常侃, 凌铭阳, 韦新杰, 覃团发. 基于层间引导的低光照图像渐进增强算法[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1911-1919.
[10]	李健京, 李贯峰, 秦飞舟, 李卫军. 基于不确定知识图谱嵌入的多关系近似推理模型[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1751-1759.
[11]	姚迅, 秦忠正, 杨捷. 生成式标签对抗的文本分类模型[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1781-1785.
[12]	高文烁, 陈晓云. 基于节点结构的点云分类网络[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1471-1478.
[13]	席治远, 唐超, 童安炀, 王文剑. 基于双路时空网络的驾驶员行为识别[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1511-1519.
[14]	孙敏, 成倩, 丁希宁. 基于CBAM-CGRU-SVM的Android恶意软件检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1539-1545.
[15]	王杰, 孟华. 基于点云整体拓扑结构的图像分类算法[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1107-1113.