Multimodal emotion recognition method based on multiscale convolution and self-attention feature fusion

doi:10.11772/j.issn.1001-9081.2023020185

Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (2): 369-376.DOI: 10.11772/j.issn.1001-9081.2023020185

Special Issue: 人工智能

• Artificial intelligence • Previous Articles Next Articles

Multimodal emotion recognition method based on multiscale convolution and self-attention feature fusion

Tian CHEN¹^,²^,³(), Conghu CAI¹^,²^,³, Xiaohui YUAN¹^,⁴, Beibei LUO¹^,²^,³

^1.School of Computer Science and Information Engineering，Hefei University of Technology，Hefei Anhui 230009，China
^2.Intelligent Interconnected Systems Laboratory of Anhui Province，Hefei Anhui 230009，China
^3.Anhui Province Key Laboratory of Affective Computing and Advanced Intelligent Machine，Hefei Anhui 230009，China
^4.Department of Computer Science and Engineering，University of North Texas，Denton Texas 76207，USA

Received:2023-02-27 Revised:2023-04-02 Accepted:2023-04-07 Online:2024-02-22 Published:2024-02-10
Contact: Tian CHEN
About author:CAI Conghu， born in 1998， M. S. candidate. His research interests include affective computing， artificial intelligence， design for testability， low-power test.
YUAN Xiaohui， born in 1973， Ph. D.， professor. His research interests include computer vision， artificial intelligence， data mining， machine learning.
LUO Beibei， born in 1999， M. S. candidate. Her research interests include affective computing， artificial intelligence， design for testability， low-power test.
Supported by:
National Natural Science Foundation of China(62174048)

基于多尺度卷积和自注意力特征融合的多模态情感识别方法

陈田¹^,²^,³(), 蔡从虎¹^,²^,³, 袁晓辉¹^,⁴, 罗蓓蓓¹^,²^,³

^1.合肥工业大学计算机与信息学院, 合肥 230009
^2.智能互联系统安徽省实验室, 合肥 230009
^3.情感计算与先进智能机器安徽省重点实验室, 合肥 230009
^4.北德克萨斯大学计算机科学与工程系, 丹顿 76207

通讯作者: 陈田
作者简介:蔡从虎（1998—），男，安徽宿州人，硕士研究生，主要研究方向：情感计算、人工智能、可测试性设计、低功耗测试
袁晓辉（1973—），男，安徽合肥人，教授，博士，主要研究方向：计算机视觉、人工智能、数据挖掘、机器学习
罗蓓蓓（1999—），女，安徽合肥人，硕士研究生，主要研究方向：情感计算、人工智能、可测试性设计、低功耗测试。
基金资助:
国家自然科学基金资助项目(62174048)

Abstract

Abstract:

Emotion recognition based on physiological signals is affected by noise and other factors， resulting in low accuracy and weak cross-individual generalization ability. Concerning the issue， a multimodal emotion recognition method based on ElectroEncephaloGram （EEG）， ElectroCardioGram （ECG）， and eye movement signals was proposed. Firstly， physiological signals were performed multi-scale convolution to obtain higher-dimensional signal features and reduce parameter size. Secondly， self-attention was employed in the fusion of multimodal signal features to enhance the weights of key features and reduce feature interference between modalities. Finally， a Bi-directional Long Short-Term Memory （Bi-LSTM） network was used for extraction of temporal information of fused features and classification. Experimental results show that， the proposed method achieves recognition accuracies of 90.29%， 91.38%， and 83.53% for valence， arousal， and valence/arousal four-class recognition tasks， respectively， with improvements of 3.46-7.11 and 0.92-3.15 percentage points compared to the EEG single-modality and EEG+ECG bimodal methods. The proposed method can accurately recognize emotion with better recognition stability between individuals.

Key words: ElectroEncephaloGram (EEG), self-attention, ElectroCardioGram (ECG), eye movement, multimodal, emotion recognition

摘要：

基于生理信号的情感识别受噪声等因素影响，存在准确率低和跨个体泛化能力弱的问题。对此，提出一种基于脑电（EEG）、心电（ECG）和眼动信号的多模态情感识别方法。首先，对生理信号进行多尺度卷积，获取更高维度的信号特征并减少参数量；其次，在多模态信号特征的融合中使用自注意力机制，以提升关键特征的权重并减少模态之间的特征干扰；最后，使用双向长短期记忆（Bi-LSTM）网络提取融合特征的时序信息并进行分类。实验结果表明，所提方法在效价、唤醒度和效价/唤醒度四分类任务上分别取得90.29%、91.38%和83.53%的识别准确率，相较于脑电单模态和脑电/心电双模态方法，准确率上提升了3.46~7.11和0.92~3.15个百分点。所提方法能够准确识别情感，在个体间的识别稳定性更好。

关键词: 脑电, 自注意力, 心电, 眼动, 多模态, 情感识别

CLC Number:

TP391

Tian CHEN, Conghu CAI, Xiaohui YUAN, Beibei LUO. Multimodal emotion recognition method based on multiscale convolution and self-attention feature fusion[J]. Journal of Computer Applications, 2024, 44(2): 369-376.

陈田, 蔡从虎, 袁晓辉, 罗蓓蓓. 基于多尺度卷积和自注意力特征融合的多模态情感识别方法[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 369-376.

Figures/Tables 13

References 29

1	JENKE R， PEER A， BUSS M. Feature extraction and selection for emotion recognition from EEG ［J］. IEEE Transactions on Affective Computing， 2014， 5（3）： 327-339. 10.1109/taffc.2014.2339834
2	KRUMPAL I. Determinants of social desirability bias in sensitive surveys： A literature review ［J］. Quality & Quantity， 2013， 47： 2025-2047. 10.1007/s11135-011-9640-9
3	BOULAY B. Towards a motivationally intelligent pedagogy： How should an intelligent tutor respond to the unmotivated or the demotivated？［C］// New Perspectives on Affect and Learning Technologies. New York： Springer， 2011， 3： 41-52. 10.1007/978-1-4419-9625-1_4
4	ZHAO G， GE Y， SHEN B， et al. Emotion analysis for personality inference from EEG signals ［J］. IEEE Transactions on Affective Computing， 2018， 9（3）： 362-371. 10.1109/taffc.2017.2786207
5	CHAO H， DONG L. Emotion recognition using three-dimensional feature and convolutional neural network from multichannel EEG signals ［J］. IEEE Sensors Journal， 2021， 21（2）： 2024-2034. 10.1109/jsen.2020.3020828
6	SONG T， ZHENG W， SONG P， et al. EEG emotion recognition using dynamical graph convolutional neural networks ［J］. IEEE Transactions on Affective Computing， 2020， 11（3）： 532-541. 10.1109/taffc.2018.2817622
7	CHEN T， YIN H， YUAN X， et al. Emotion recognition based on fusion of long short-term memory networks and SVMs ［J］. Digital Signal Processing， 2021， 117： 103153. 10.1016/j.dsp.2021.103153
8	KATSIGIANNIS S， RAMZAN N. DREAMER： A database for emotion recognition through EEG and ECG signals from wireless low-cost off-the-shelf devices ［J］. IEEE Journal of Biomedical and Health Informatics， 2017， 22（1）： 98-107. 10.1109/jbhi.2017.2688239
9	CHEN T， JU S， REN F， et al. EEG emotion recognition model based on the LIBSVM classifier ［J］. Measurement， 2020， 164： 108047. 10.1016/j.measurement.2020.108047
10	陈田，樊明焱，任福继，等.采用瞳孔位置实现情感识别的方案［J］.计算机应用研究，2021，38（6）：1765-1769. 10.19734/j.issn.1001-3695.2020.06.0165
	CHEN T， FAN M Y， REN F J， et al. Emotion recognition using pupil position ［J］. Application Research of Computers， 2021， 38（6）： 1765-1769. 10.19734/j.issn.1001-3695.2020.06.0165
11	ALARCÃO S M， FRONSECA M J. Emotions recognition using EEG signals： a survey ［J］. IEEE Transactions on Affective Computing， 2017， 10（3）： 374-393.
12	SINGSON L N B， SANCHEZ M T U R， VILLAVERDE J F. Emotion recognition using short-term analysis of heart rate variability and ResNet architecture ［C］// Proceedings of the 2021 13th International Conference on Computer and Automation Engineering. Piscataway： IEEE， 2021： 15-18. 10.1109/iccae51876.2021.9426094
13	CHEN J X， ZHANG P W， MAO Z J， et al. Accurate EEG-based emotion recognition on combined features using deep convolutional neural networks ［J］. IEEE Access， 2019， 7： 44317-44328. 10.1109/access.2019.2908285
14	KOELSTRA S， MUHL C， SOLEYMANI M， et al. DEAP： a database for emotion analysis； using physiological signals ［J］. IEEE Transactions on Affective Computing， 2011， 3（1）： 18-31. 10.1109/t-affc.2011.15
15	SZEGEDY C， LIU W， JIA Y， et al. Going deeper with convolutions ［C］// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2015： 1-9. 10.1109/cvpr.2015.7298594
16	SANTAMARÍA-VÁZQUEZ E， MARTÍNEZ-CAGIGAL V， VAQUERIZO-VILLAR F， et al. EEG-inception： a novel deep convolutional neural network for assistive ERP-based brain-computer interfaces ［J］. IEEE Transactions on Neural Systems and Rehabilitation Engineering， 2020， 28（12）： 2773-2782. 10.1109/tnsre.2020.3048106
17	LEE D-Y， J-H JEONG， K-H SHIM， et al. Classification of upper limb movements using convolutional neural network with 3D inception block ［C］// Proceedings of the 2020 8th International Winter Conference on Brain-Computer Interface. Piscataway： IEEE， 2020： 1-5. 10.1109/bci48061.2020.9061671
18	Y-H KWON， S-B SHIN， KIM S-D. Electroencephalography based fusion Two-Dimensional （2D）-Convolution Neural Networks （CNN） model for emotion recognition system ［J］. Sensors， 2018， 18（5）： 1383. 10.3390/s18051383
19	TAO W， LI C， SONG R， et al. EEG-based emotion recognition via channel-wise attention and self attention ［J］. IEEE Transactions on Affective Computing， 2023， 14（1）： 382-393. 10.1109/taffc.2020.3025777
20	DZEDZICKIS A， KAKLAUSKAS A， BUCINSKAS V. Human emotion recognition： Review of sensors and methods ［J］. Sensors， 2020， 20（3）： 592. 10.3390/s20030592
21	IOFFE S， SZEGEDY C. Batch normalization： accelerating deep network training by reducing internal covariate shift ［C］// Proceedings of the 32nd International Conference on Machine Learning. New York： JMLR.org， 2015： 448-456.
22	VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need ［EB/OL］. （2017-12-06）［2023-02-01］. .
23	DALY I， WILLIAMS D， MALIK A， et al. Personalised， multi-modal， affective state detection for hybrid brain-computer music interfacing ［J］. IEEE Transactions on Affective Computing， 2018， 11（1）： 111-124.
24	DONOHO D L， JOHNSTONE I M. Ideal spatial adaptation by wavelet shrinkage ［J］. Biometrika， 1994， 81（3）： 425-455. 10.1093/biomet/81.3.425
25	DIKKER S， WAN L， DAVIDESCO I， et al. Brain-to-brain synchrony tracks real-world dynamic group interactions in the classroom ［J］. Current Biology， 2017， 27（9）： 1375-1380. 10.1016/j.cub.2017.04.002
26	LI W， CHU M， QIAO J. Design of a hierarchy modular neural network and its application in multimodal emotion recognition ［J］. Soft Computing， 2019， 23： 11817-11828. 10.1007/s00500-018-03735-0
27	WU X， ZHENG W-L， LI Z， et al. Investigating EEG-based functional connectivity patterns for multimodal emotion recognition ［J］. Journal of Neural Engineering， 2022， 19： 016012. 10.1088/1741-2552/ac49a7
28	李路宝，陈田，任福继，等.基于图神经网络和注意力的双模态情感识别方法［J］.计算机应用，2023，43（3）：700-705. 10.11772/j.issn.1001-9081.2022020216
	LI L B， CHEN T， REN F J， et al. Bimodal emotion recognition method based on graph neural network and attention ［J］. Journal of Computer Applications， 2023， 43（3）： 700-705. 10.11772/j.issn.1001-9081.2022020216
29	PEREIRA E T， GOMES H M， VELOSO L R， et al. Empirical evidence relating EEG signal duration to emotion classification performance ［J］. IEEE Transactions on Affective Computing， 2021， 12（1）： 154-164. 10.1109/taffc.2018.2854168

频带	频率范围/Hz	大脑活动状态
δ	［0.5，4）	和深度睡眠有关活动
θ	［4，8）	睡意、冥想状态
α	［8，16）	清醒时闭眼、处于放松状态
β	［16，32）	注意力集中、情绪波动
γ	［32，45）	激动、亢奋或强烈的情绪状态

频带	频率范围/Hz	大脑活动状态
δ	［0.5，4）	和深度睡眠有关活动
θ	［4，8）	睡意、冥想状态
α	［8，16）	清醒时闭眼、处于放松状态
β	［16，32）	注意力集中、情绪波动
γ	［32，45）	激动、亢奋或强烈的情绪状态

提取方法	Valence		Arousal
提取方法	ACC	STD	ACC	STD
SVM	52.28	12.11	53.92	11.74
CNN	62.84	10.24	65.97	9.98
1D⁃Inception	81.26	8.77	83.97	7.91

提取方法	Valence		Arousal
提取方法	ACC	STD	ACC	STD
SVM	52.28	12.11	53.92	11.74
CNN	62.84	10.24	65.97	9.98
1D⁃Inception	81.26	8.77	83.97	7.91

长度	Valence		Arousal
长度	ACC	STD	ACC	STD
1	76.48	8.39	73.44	7.61
3	80.00	6.27	65.97	9.90
6	90.29	6.28	91.38	6.02
10	89.92	6.49	90.08	6.07
15	85.12	6.11	88.30	5.98

Multimodal emotion recognition method based on multiscale convolution and self-attention feature fusion

基于多尺度卷积和自注意力特征融合的多模态情感识别方法

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 13

References 29

Related Articles 15

Recommended Articles

Metrics

多模态融合方法	Valence		Arousal		V/A四分类
多模态融合方法	ACC	STD	ACC	STD	ACC	STD
直接融合	84.77	7.76	86.92	7.74	75.26	13.42
决策层融合	85.26	7.21	88.30	6.98	79.55	10.29
自注意力融合	90.29	6.28	91.38	6.02	83.53	9.77

使用模态	Valence		Arousal		V/A四分类
使用模态	ACC	STD	ACC	STD	ACC	STD
EEG	84.17	7.40	87.92	7.23	76.42	10.47
ECG	77.84	10.24	69.97	9.98	45.39	11.20
眼动信号	65.20	12.01	70.13	11.89	39.28	13.09
EEG+ECG双模态	89.37	6.97	88.23	6.73	82.26	9.97
EEG+眼动双模态	84.79	7.82	86.30	7.54	78.10	10.59
三模态（本文方法）	90.29	6.28	91.38	6.02	83.53	9.77

方法	准确率		方法	准确率
方法	Valence	Arousal	方法	Valence	Arousal
文献［18］方法	76.56	80.46	文献［27］方法	86.61	85.34
文献［26］方法	84.00	72.00	文献［28］方法	91.82	88.24
文献［7］方法	85.38	77.52	本文方法	90.29	91.38

[1]	Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974.
[2]	Xiyuan WANG, Zhancheng ZHANG, Shaokang XU, Baocheng ZHANG, Xiaoqing LUO, Fuyuan HU. Unsupervised cross-domain transfer network for 3D/2D registration in surgical navigation [J]. Journal of Computer Applications, 2024, 44(9): 2911-2918.
[3]	Liting LI, Bei HUA, Ruozhou HE, Kuang XU. Multivariate time series prediction model based on decoupled attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2732-2738.
[4]	Shunyong LI, Shiyi LI, Rui XU, Xingwang ZHAO. Incomplete multi-view clustering algorithm based on self-attention fusion [J]. Journal of Computer Applications, 2024, 44(9): 2696-2703.
[5]	Kaipeng XUE, Tao XU, Chunjie LIAO. Multimodal sentiment analysis network with self-supervision and multi-layer cross attention [J]. Journal of Computer Applications, 2024, 44(8): 2387-2392.
[6]	Pengqi GAO, Heming HUANG, Yonghong FAN. Fusion of coordinate and multi-head attention mechanisms for interactive speech emotion recognition [J]. Journal of Computer Applications, 2024, 44(8): 2400-2406.
[7]	Fan YANG, Yao ZOU, Mingzhi ZHU, Zhenwei MA, Dawei CHENG, Changjun JIANG. Credit card fraud detection model based on graph attention Transformation neural network [J]. Journal of Computer Applications, 2024, 44(8): 2634-2642.
[8]	Hao CHAO, Shuqi FENG, Yongli LIU. Convolutional recurrent neural network optimized by multiple context vectors in EEG-based emotion recognition [J]. Journal of Computer Applications, 2024, 44(7): 2041-2046.
[9]	Yue LIU, Fang LIU, Aoyun WU, Qiuyue CHAI, Tianxiao WANG. 3D object detection network based on self-attention mechanism and graph convolution [J]. Journal of Computer Applications, 2024, 44(6): 1972-1977.
[10]	Zexin XU, Lei YANG, Kangshun LI. Shorter long-sequence time series forecasting model [J]. Journal of Computer Applications, 2024, 44(6): 1824-1831.
[11]	Juxiang ZHOU, Jinsheng LIU, Jianhou GAN, Di WU, Zijie LI. Classroom speech emotion recognition method based on multi-scale temporal-aware network [J]. Journal of Computer Applications, 2024, 44(5): 1636-1643.
[12]	Zihan LIU, Dengwen ZHOU, Yukai LIU. Image super-resolution network based on global dependency Transformer [J]. Journal of Computer Applications, 2024, 44(5): 1588-1596.
[13]	Rong HUANG, Junjie SONG, Shubo ZHOU, Hao LIU. Image aesthetic quality evaluation method based on self-supervised vision Transformer [J]. Journal of Computer Applications, 2024, 44(4): 1269-1276.
[14]	Zhanjun JIANG, Baijing WU, Long MA, Jing LIAN. Faster-RCNN water-floating garbage recognition based on multi-scale feature and polarized self-attention [J]. Journal of Computer Applications, 2024, 44(3): 938-944.
[15]	Ziqi HUANG, Jianpeng HU. Entity category enhanced nested named entity recognition in automotive domain [J]. Journal of Computer Applications, 2024, 44(2): 377-384.