Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (6): 1869-1875.

Special Issue: 人工智能

End-to-end speech emotion recognition based on multi-head attention

Lei YANG1, Hongdong ZHAO1(), Kuaikuai YU2   

  1. 1.School of Electronics and Information Engineering,Hebei University of Technology,Tianjin 300401,China
    2.Science and Technology on Electro-Optical Information Security Control Laboratory,Tianjin 300308,China
  • Received:2021-04-14 Revised:2021-07-19 Accepted:2021-07-23 Online:2022-06-22 Published:2022-06-10
杨磊1, 赵红东1(), 于快快2   

  1. 1.河北工业大学 电子信息工程学院,天津 300401
    2.光电信息控制和安全技术重点实验室,天津 300308
Aiming at the characteristics of small size and high data dimensionality of speech emotion datasets, to solve the problem of long-range dependence disappearance in traditional Recurrent Neural Network (RNN) and insufficient excavation of potential relationship between frames within the input sequence because of focus on local information of Convolutional Neural Network (CNN), a new neural network MAH-SVM based on Multi-Head Attention (MHA) and Support Vector Machine (SVM) was proposed for Speech Emotion Recognition (SER). First, the original audio data were input into the MHA network to train the parameters of MHA and obtain the classification results of MHA. Then, the same original audio data were input into the pre-trained MHA again for feature extraction. Finally, these obtained features were fed into SVM after the fully connected layer to obtain classification results of MHA-SVM. After fully evaluating the effect of the heads and layers in the MHA module on the experimental results, it was found that MHA-SVM achieved the highest recognition accuracy of 69.6% on IEMOCAP dataset. Experimental results indicate that the end-to-end model based on MHA mechanism is more suitable for SER tasks compared with models based on RNN and CNN.

Key words: Speech Emotion Recognition (SER), Multi-Head Attention (MHA), Convolutional Neural Network (CNN), Support Vector Machine (SVM), end-to-end



关键词: 语音情感识别, 多头注意力, 卷积神经网络, 支持向量机, 端到端

