基于多尺度卷积和自注意力特征融合的多模态情感识别方法

doi:10.11772/j.issn.1001-9081.2023020185

《计算机应用》唯一官方网站 ›› 2024, Vol. 44 ›› Issue (2): 369-376.DOI: 10.11772/j.issn.1001-9081.2023020185

• 人工智能 • 上一篇

基于多尺度卷积和自注意力特征融合的多模态情感识别方法

陈田¹^,²^,³(), 蔡从虎¹^,²^,³, 袁晓辉¹^,⁴, 罗蓓蓓¹^,²^,³

^1.合肥工业大学计算机与信息学院, 合肥 230009
^2.智能互联系统安徽省实验室, 合肥 230009
^3.情感计算与先进智能机器安徽省重点实验室, 合肥 230009
^4.北德克萨斯大学计算机科学与工程系, 丹顿 76207

收稿日期:2023-02-27 修回日期:2023-04-02 接受日期:2023-04-07 发布日期:2024-02-22 出版日期:2024-02-10
通讯作者: 陈田
作者简介:蔡从虎（1998—），男，安徽宿州人，硕士研究生，主要研究方向：情感计算、人工智能、可测试性设计、低功耗测试
袁晓辉（1973—），男，安徽合肥人，教授，博士，主要研究方向：计算机视觉、人工智能、数据挖掘、机器学习
罗蓓蓓（1999—），女，安徽合肥人，硕士研究生，主要研究方向：情感计算、人工智能、可测试性设计、低功耗测试。
基金资助:
国家自然科学基金资助项目(62174048)

Multimodal emotion recognition method based on multiscale convolution and self-attention feature fusion

Tian CHEN¹^,²^,³(), Conghu CAI¹^,²^,³, Xiaohui YUAN¹^,⁴, Beibei LUO¹^,²^,³

^1.School of Computer Science and Information Engineering，Hefei University of Technology，Hefei Anhui 230009，China
^2.Intelligent Interconnected Systems Laboratory of Anhui Province，Hefei Anhui 230009，China
^3.Anhui Province Key Laboratory of Affective Computing and Advanced Intelligent Machine，Hefei Anhui 230009，China
^4.Department of Computer Science and Engineering，University of North Texas，Denton Texas 76207，USA

Received:2023-02-27 Revised:2023-04-02 Accepted:2023-04-07 Online:2024-02-22 Published:2024-02-10
Contact: Tian CHEN
About author:CAI Conghu， born in 1998， M. S. candidate. His research interests include affective computing， artificial intelligence， design for testability， low-power test.
YUAN Xiaohui， born in 1973， Ph. D.， professor. His research interests include computer vision， artificial intelligence， data mining， machine learning.
LUO Beibei， born in 1999， M. S. candidate. Her research interests include affective computing， artificial intelligence， design for testability， low-power test.
Supported by:
National Natural Science Foundation of China(62174048)

摘要/Abstract

摘要：

基于生理信号的情感识别受噪声等因素影响，存在准确率低和跨个体泛化能力弱的问题。对此，提出一种基于脑电（EEG）、心电（ECG）和眼动信号的多模态情感识别方法。首先，对生理信号进行多尺度卷积，获取更高维度的信号特征并减少参数量；其次，在多模态信号特征的融合中使用自注意力机制，以提升关键特征的权重并减少模态之间的特征干扰；最后，使用双向长短期记忆（Bi-LSTM）网络提取融合特征的时序信息并进行分类。实验结果表明，所提方法在效价、唤醒度和效价/唤醒度四分类任务上分别取得90.29%、91.38%和83.53%的识别准确率，相较于脑电单模态和脑电/心电双模态方法，准确率上提升了3.46~7.11和0.92~3.15个百分点。所提方法能够准确识别情感，在个体间的识别稳定性更好。

关键词: 脑电, 自注意力, 心电, 眼动, 多模态, 情感识别

Abstract:

Emotion recognition based on physiological signals is affected by noise and other factors， resulting in low accuracy and weak cross-individual generalization ability. Concerning the issue， a multimodal emotion recognition method based on ElectroEncephaloGram （EEG）， ElectroCardioGram （ECG）， and eye movement signals was proposed. Firstly， physiological signals were performed multi-scale convolution to obtain higher-dimensional signal features and reduce parameter size. Secondly， self-attention was employed in the fusion of multimodal signal features to enhance the weights of key features and reduce feature interference between modalities. Finally， a Bi-directional Long Short-Term Memory （Bi-LSTM） network was used for extraction of temporal information of fused features and classification. Experimental results show that， the proposed method achieves recognition accuracies of 90.29%， 91.38%， and 83.53% for valence， arousal， and valence/arousal four-class recognition tasks， respectively， with improvements of 3.46-7.11 and 0.92-3.15 percentage points compared to the EEG single-modality and EEG+ECG bimodal methods. The proposed method can accurately recognize emotion with better recognition stability between individuals.

Key words: ElectroEncephaloGram (EEG), self-attention, ElectroCardioGram (ECG), eye movement, multimodal, emotion recognition

中图分类号:

TP391

陈田, 蔡从虎, 袁晓辉, 罗蓓蓓. 基于多尺度卷积和自注意力特征融合的多模态情感识别方法[J]. 计算机应用, 2024, 44(2): 369-376.

Tian CHEN, Conghu CAI, Xiaohui YUAN, Beibei LUO. Multimodal emotion recognition method based on multiscale convolution and self-attention feature fusion[J]. Journal of Computer Applications, 2024, 44(2): 369-376.

图/表 13

图1 多模态情感识别方法的模型结构

Fig. 1 Model architecture of multimodal emotion recognition method

表1 EEG不同频带对应的大脑活动

Tab. 1 Brain activities corresponding to different frequency bands of EEG

频带	频率范围/Hz	大脑活动状态
δ	［0.5，4）	和深度睡眠有关活动
θ	［4，8）	睡意、冥想状态
α	［8，16）	清醒时闭眼、处于放松状态
β	［16，32）	注意力集中、情绪波动
γ	［32，45）	激动、亢奋或强烈的情绪状态

图2 1D-Inception特征学习模块的结构

Fig. 2 Structure of 1D-Inception feature learning module

图3 自注意力模块的结构

Fig. 3 Structure of self-attention module

图4 Bi-LSTM网络的结构和数据流向

Fig. 4 Structure and data flow of Bi-LSTM network

图5 多模态情感识别的总体实验流程

Fig. 5 Overall experimental flow of multimodal emotion recognition

图6 数据采集的实验流程

Fig. 6 Experimental flow of data acquisition

图7 实验使用的32导设备的电极分布

Fig. 7 Electrode distribution of 32-channel devices used in experiment

表2 1D-Inception与其他特征提取方法的准确率对比 (%)

Tab. 2 Accuracy comparison of 1D-Inception with other feature extraction methods

提取方法	Valence		Arousal
提取方法	ACC	STD	ACC	STD
SVM	52.28	12.11	53.92	11.74
CNN	62.84	10.24	65.97	9.98
1D⁃Inception	81.26	8.77	83.97	7.91

表3 不同的Bi-LSTM序列长度的实验结果对比 (%)

Tab. 3 Comparison of experimental results with different sequence lengths of Bi-LSTM

长度	Valence		Arousal
长度	ACC	STD	ACC	STD
1	76.48	8.39	73.44	7.61
3	80.00	6.27	65.97	9.90
6	90.29	6.28	91.38	6.02
10	89.92	6.49	90.08	6.07
15	85.12	6.11	88.30	5.98

表4 自注意力融合方法和其他融合方法的准确率对比 (%)

Tab. 4 Accuracy comparison between self-attention-based fusion method and other fusion methods

多模态融合方法	Valence		Arousal		V/A四分类
多模态融合方法	ACC	STD	ACC	STD	ACC	STD
直接融合	84.77	7.76	86.92	7.74	75.26	13.42
决策层融合	85.26	7.21	88.30	6.98	79.55	10.29
自注意力融合	90.29	6.28	91.38	6.02	83.53	9.77

表5 多模态方法与单、双模态方法的准确率对比 (%)

Tab. 5 Accuracy comparison between multimodal method with unimodal and bimodal methods

使用模态	Valence		Arousal		V/A四分类
使用模态	ACC	STD	ACC	STD	ACC	STD
EEG	84.17	7.40	87.92	7.23	76.42	10.47
ECG	77.84	10.24	69.97	9.98	45.39	11.20
眼动信号	65.20	12.01	70.13	11.89	39.28	13.09
EEG+ECG双模态	89.37	6.97	88.23	6.73	82.26	9.97
EEG+眼动双模态	84.79	7.82	86.30	7.54	78.10	10.59
三模态（本文方法）	90.29	6.28	91.38	6.02	83.53	9.77

表6 与现存的基于生理信号情感识别方法的准确率对比 (%)

Tab. 6 Accuracy comparison with existing physiological signal-based emotion recognition methods

方法	准确率		方法	准确率
方法	Valence	Arousal	方法	Valence	Arousal
文献［18］方法	76.56	80.46	文献［27］方法	86.61	85.34
文献［26］方法	84.00	72.00	文献［28］方法	91.82	88.24
文献［7］方法	85.38	77.52	本文方法	90.29	91.38

参考文献 29

1	JENKE R， PEER A， BUSS M. Feature extraction and selection for emotion recognition from EEG ［J］. IEEE Transactions on Affective Computing， 2014， 5（3）： 327-339. 10.1109/taffc.2014.2339834
2	KRUMPAL I. Determinants of social desirability bias in sensitive surveys： A literature review ［J］. Quality & Quantity， 2013， 47： 2025-2047. 10.1007/s11135-011-9640-9
3	BOULAY B. Towards a motivationally intelligent pedagogy： How should an intelligent tutor respond to the unmotivated or the demotivated？［C］// New Perspectives on Affect and Learning Technologies. New York： Springer， 2011， 3： 41-52. 10.1007/978-1-4419-9625-1_4
4	ZHAO G， GE Y， SHEN B， et al. Emotion analysis for personality inference from EEG signals ［J］. IEEE Transactions on Affective Computing， 2018， 9（3）： 362-371. 10.1109/taffc.2017.2786207
5	CHAO H， DONG L. Emotion recognition using three-dimensional feature and convolutional neural network from multichannel EEG signals ［J］. IEEE Sensors Journal， 2021， 21（2）： 2024-2034. 10.1109/jsen.2020.3020828
6	SONG T， ZHENG W， SONG P， et al. EEG emotion recognition using dynamical graph convolutional neural networks ［J］. IEEE Transactions on Affective Computing， 2020， 11（3）： 532-541. 10.1109/taffc.2018.2817622
7	CHEN T， YIN H， YUAN X， et al. Emotion recognition based on fusion of long short-term memory networks and SVMs ［J］. Digital Signal Processing， 2021， 117： 103153. 10.1016/j.dsp.2021.103153
8	KATSIGIANNIS S， RAMZAN N. DREAMER： A database for emotion recognition through EEG and ECG signals from wireless low-cost off-the-shelf devices ［J］. IEEE Journal of Biomedical and Health Informatics， 2017， 22（1）： 98-107. 10.1109/jbhi.2017.2688239
9	CHEN T， JU S， REN F， et al. EEG emotion recognition model based on the LIBSVM classifier ［J］. Measurement， 2020， 164： 108047. 10.1016/j.measurement.2020.108047
10	陈田，樊明焱，任福继，等.采用瞳孔位置实现情感识别的方案［J］.计算机应用研究，2021，38（6）：1765-1769. 10.19734/j.issn.1001-3695.2020.06.0165
	CHEN T， FAN M Y， REN F J， et al. Emotion recognition using pupil position ［J］. Application Research of Computers， 2021， 38（6）： 1765-1769. 10.19734/j.issn.1001-3695.2020.06.0165
11	ALARCÃO S M， FRONSECA M J. Emotions recognition using EEG signals： a survey ［J］. IEEE Transactions on Affective Computing， 2017， 10（3）： 374-393.
12	SINGSON L N B， SANCHEZ M T U R， VILLAVERDE J F. Emotion recognition using short-term analysis of heart rate variability and ResNet architecture ［C］// Proceedings of the 2021 13th International Conference on Computer and Automation Engineering. Piscataway： IEEE， 2021： 15-18. 10.1109/iccae51876.2021.9426094
13	CHEN J X， ZHANG P W， MAO Z J， et al. Accurate EEG-based emotion recognition on combined features using deep convolutional neural networks ［J］. IEEE Access， 2019， 7： 44317-44328. 10.1109/access.2019.2908285
14	KOELSTRA S， MUHL C， SOLEYMANI M， et al. DEAP： a database for emotion analysis； using physiological signals ［J］. IEEE Transactions on Affective Computing， 2011， 3（1）： 18-31. 10.1109/t-affc.2011.15
15	SZEGEDY C， LIU W， JIA Y， et al. Going deeper with convolutions ［C］// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2015： 1-9. 10.1109/cvpr.2015.7298594
16	SANTAMARÍA-VÁZQUEZ E， MARTÍNEZ-CAGIGAL V， VAQUERIZO-VILLAR F， et al. EEG-inception： a novel deep convolutional neural network for assistive ERP-based brain-computer interfaces ［J］. IEEE Transactions on Neural Systems and Rehabilitation Engineering， 2020， 28（12）： 2773-2782. 10.1109/tnsre.2020.3048106
17	LEE D-Y， J-H JEONG， K-H SHIM， et al. Classification of upper limb movements using convolutional neural network with 3D inception block ［C］// Proceedings of the 2020 8th International Winter Conference on Brain-Computer Interface. Piscataway： IEEE， 2020： 1-5. 10.1109/bci48061.2020.9061671
18	Y-H KWON， S-B SHIN， KIM S-D. Electroencephalography based fusion Two-Dimensional （2D）-Convolution Neural Networks （CNN） model for emotion recognition system ［J］. Sensors， 2018， 18（5）： 1383. 10.3390/s18051383
19	TAO W， LI C， SONG R， et al. EEG-based emotion recognition via channel-wise attention and self attention ［J］. IEEE Transactions on Affective Computing， 2023， 14（1）： 382-393. 10.1109/taffc.2020.3025777
20	DZEDZICKIS A， KAKLAUSKAS A， BUCINSKAS V. Human emotion recognition： Review of sensors and methods ［J］. Sensors， 2020， 20（3）： 592. 10.3390/s20030592
21	IOFFE S， SZEGEDY C. Batch normalization： accelerating deep network training by reducing internal covariate shift ［C］// Proceedings of the 32nd International Conference on Machine Learning. New York： JMLR.org， 2015： 448-456.
22	VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need ［EB/OL］. （2017-12-06）［2023-02-01］. .
23	DALY I， WILLIAMS D， MALIK A， et al. Personalised， multi-modal， affective state detection for hybrid brain-computer music interfacing ［J］. IEEE Transactions on Affective Computing， 2018， 11（1）： 111-124.
24	DONOHO D L， JOHNSTONE I M. Ideal spatial adaptation by wavelet shrinkage ［J］. Biometrika， 1994， 81（3）： 425-455. 10.1093/biomet/81.3.425
25	DIKKER S， WAN L， DAVIDESCO I， et al. Brain-to-brain synchrony tracks real-world dynamic group interactions in the classroom ［J］. Current Biology， 2017， 27（9）： 1375-1380. 10.1016/j.cub.2017.04.002
26	LI W， CHU M， QIAO J. Design of a hierarchy modular neural network and its application in multimodal emotion recognition ［J］. Soft Computing， 2019， 23： 11817-11828. 10.1007/s00500-018-03735-0
27	WU X， ZHENG W-L， LI Z， et al. Investigating EEG-based functional connectivity patterns for multimodal emotion recognition ［J］. Journal of Neural Engineering， 2022， 19： 016012. 10.1088/1741-2552/ac49a7
28	李路宝，陈田，任福继，等.基于图神经网络和注意力的双模态情感识别方法［J］.计算机应用，2023，43（3）：700-705. 10.11772/j.issn.1001-9081.2022020216
	LI L B， CHEN T， REN F J， et al. Bimodal emotion recognition method based on graph neural network and attention ［J］. Journal of Computer Applications， 2023， 43（3）： 700-705. 10.11772/j.issn.1001-9081.2022020216
29	PEREIRA E T， GOMES H M， VELOSO L R， et al. Empirical evidence relating EEG signal duration to emotion classification performance ［J］. IEEE Transactions on Affective Computing， 2021， 12（1）： 154-164. 10.1109/taffc.2018.2854168

[1]	黄懿蕊, 罗俊玮, 陈景强. 基于对比学习和GIF标记的多模态对话回复检索[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 32-38.
[2]	陈佳, 张鸿. 基于特征增强和语义相关性匹配的图像文本检索方法[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 16-23.
[3]	林于翔, 吴运兵, 阴爱英, 廖祥文. 基于语义相关性分析的多模态摘要模型[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 65-72.
[4]	罗俊豪, 朱焱. 用于未对齐多模态语言序列情感分析的多交互感知网络[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 79-85.
[5]	李牧, 杨宇恒, 柯熙政. 基于混合特征提取与跨模态特征预测融合的情感识别模型[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 86-93.
[6]	王春雷, 王肖, 刘凯. 多模态知识图谱表示学习综述[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 1-15.
[7]	史含笑, 王雷春. 结合LSTM和自注意力机制的图卷积网络短期电力负荷预测[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 311-317.
[8]	陈丽安, 过弋. 融合个体偏差信息的文本情感分析模型[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 145-151.
[9]	赵强, 王中卿, 王红玲. 融合多模态信息的产品摘要抽取模型[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 73-78.
[10]	袁国龙, 张玉金, 刘洋. 基于残差反馈和自注意力的图像篡改取证网络[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2925-2931.
[11]	何嘉明, 杨巨成, 吴超, 闫潇宁, 许能华. 基于多模态图卷积神经网络的行人重识别方法[J]. 《计算机应用》唯一官方网站, 2023, 43(7): 2182-2189.
[12]	林剑, 叶璟轩, 刘雯雯, 邵晓雯. 求解带容量约束车辆路径问题的多模态差分进化算法[J]. 《计算机应用》唯一官方网站, 2023, 43(7): 2248-2254.
[13]	张奕, 王真梅. 图自动编码器上二阶段融合实现的环状RNA-疾病关联预测[J]. 《计算机应用》唯一官方网站, 2023, 43(6): 1979-1986.
[14]	孙男男, 朴春慧, 马新娜. 基于社交关系和时序信息的团购推荐方法[J]. 《计算机应用》唯一官方网站, 2023, 43(6): 1719-1729.
[15]	侯志荣, 范晓东, 张华, 马晓楠. J-SGPGN：基于序列与图的联合学习复述生成网络[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1365-1371.

基于多尺度卷积和自注意力特征融合的多模态情感识别方法

Multimodal emotion recognition method based on multiscale convolution and self-attention feature fusion

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 13

参考文献 29

相关文章 15

编辑推荐

Metrics