Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Classroom speech emotion recognition method based on multi-scale temporal-aware network
Juxiang ZHOU, Jinsheng LIU, Jianhou GAN, Di WU, Zijie LI
Journal of Computer Applications    2024, 44 (5): 1636-1643.   DOI: 10.11772/j.issn.1001-9081.2023050663
Abstract275)   HTML10)    PDF (4548KB)(628)       Save

Speech emotion recognition has been widely used in multi-scenario intelligent systems in recent years, and it also provides the possibility to realize intelligent analysis of teaching behaviors in smart classroom environments. Classroom speech emotion recognition technology can be used to automatically recognize the emotional states of teachers and students during classroom teaching, help teachers understand their own teaching styles and grasp students’ classroom learning status in a timely manner, thereby achieving the purpose of precise teaching. For the classroom speech emotion recognition task, firstly, classroom teaching videos were collected from primary and secondary schools, the audio was extracted, and manually segmented and annotated to construct a primary and secondary school teaching speech emotion corpus containing six emotion categories. Secondly, based on the Temporal Convolutional Network (TCN) and cross-gated mechanism, dual temporal convolution channels were designed to extract multi-scale cross-fusion features. Finally, a dynamic weight fusion strategy was adopted to adjust the contributions of features at different scales, reduce the interference of non-important features on the recognition results, and further enhance the representation and learning ability of the model. Experimental results show that the proposed method is superior to advanced models such as TIM-Net (Temporal-aware bI-direction Multi-scale Network), GM-TCNet (Gated Multi-scale Temporal Convolutional Network), and CTL-MTNet (CapsNet and Transfer Learning-based Mixed Task Net) on multiple public datasets, and its UAR (Unweighted Average Recall) and WAR (Weighted Average Recall) reach 90.58% and 90.45% respectively in real classroom speech emotion recognition task.

Table and Figures | Reference | Related Articles | Metrics