Search Result

Journals

Publication Years

Keywords

Please wait a minute...

For Selected:

Download Citations
EndNote Ris BibTeX

Toggle Thumbnails

Select

Speech emotion recognition method based on hybrid Siamese network with CNN and bidirectional GRU

Peng PENG, Ziting CAI, Wenling LIU, Caihua CHEN, Wei ZENG, Baolai HUANG

Journal of Computer Applications 2025, 45 (8): 2515-2521. DOI: 10.11772/j.issn.1001-9081.2024081142

Abstract （28）

HTML （0）

PDF （1899KB）（86）

Save

In order to solve the problems of low accuracy and poor generalization ability in the existing Speech Emotion Recognition （SER） models， a hybrid Siamese Multi-scale CNN-BiGRU network was proposed. In this network， a Multi-Scale Feature Extractor （MSFE） and a Multi-Dimensional Attention （MDA） module were introduced to construct a Siamese network， and the training data were increased by utilizing sample pairs， thereby improving the model’s recognition accuracy and enabling it to better adapt to complex real-world application scenarios. Experimental results on IEMOCAP and EMO-DB public datasets show that the recognition accuracy of the proposed model is enhanced by 8.28 and 7.79 percentage points， respectively， compared to that of CNN-BiGRU model. Furthermore， a customer service speech emotion dataset was constructed by collecting real customer service conversation recordings. Experimental results on this dataset show that the recognition accuracy of the proposed model can reach 87.85%， indicating that the proposed model has good generalization ability.

Table and Figures | Reference | Related Articles | Metrics