Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Speech emotion recognition method based on hybrid Siamese network with CNN and bidirectional GRU
Peng PENG, Ziting CAI, Wenling LIU, Caihua CHEN, Wei ZENG, Baolai HUANG
Journal of Computer Applications    2025, 45 (8): 2515-2521.   DOI: 10.11772/j.issn.1001-9081.2024081142
Abstract28)   HTML0)    PDF (1899KB)(86)       Save

In order to solve the problems of low accuracy and poor generalization ability in the existing Speech Emotion Recognition (SER) models, a hybrid Siamese Multi-scale CNN-BiGRU network was proposed. In this network, a Multi-Scale Feature Extractor (MSFE) and a Multi-Dimensional Attention (MDA) module were introduced to construct a Siamese network, and the training data were increased by utilizing sample pairs, thereby improving the model’s recognition accuracy and enabling it to better adapt to complex real-world application scenarios. Experimental results on IEMOCAP and EMO-DB public datasets show that the recognition accuracy of the proposed model is enhanced by 8.28 and 7.79 percentage points, respectively, compared to that of CNN-BiGRU model. Furthermore, a customer service speech emotion dataset was constructed by collecting real customer service conversation recordings. Experimental results on this dataset show that the recognition accuracy of the proposed model can reach 87.85%, indicating that the proposed model has good generalization ability.

Table and Figures | Reference | Related Articles | Metrics