基于SAA-CNN-BiLSTM网络的多特征融合语音情感识别方法

doi:10.11772/j.issn.1001-9081.2025010042

《计算机应用》唯一官方网站

• • 下一篇

基于SAA-CNN-BiLSTM网络的多特征融合语音情感识别方法

昝志辉,王雅静,李珂,杨智翔,杨光宇

山东理工大学

收稿日期:2025-01-13 修回日期:2025-03-21 发布日期:2025-04-27 出版日期:2025-04-27
通讯作者: 昝志辉

Multi-feature fusion speech emotion recognition method based on SAA-CNN-BiLSTM network

Received:2025-01-13 Revised:2025-03-21 Online:2025-04-27 Published:2025-04-27

摘要/Abstract

摘要： 针对单一语音情感特征对语音信息表征不全面及模型对语音特征利用率低的问题，提出了一种基于SAA-CNN-BiLSTM网络的多特征融合语音情感识别方法。该方法引入噪声、音量和音速增强器对数据增强，使模型学习到多样化数据特征，将基频、时域以及频域特征进行多特征融合，从不同角度全面表达情感信息。在BiLSTM网络基础上引入CNN捕获输入数据的空间相关性，提取更具代表性的特征，构建简化加性注意力机制，简化显式查询键和查询向量，使注意力权重计算不依赖于特定查询信息，不同维度的特征能基于注意力权重进行相互关联和影响，特征之间的信息得以交互和融合，提高特征有效利用率。实验结果表明，该方法在EMO-DB、CASIA、SAVEE数据集上分别达到了87.02%、82.59%、73.13%的效果，相较于IncConv、NHPC-BiLSTM和DCRNN等方法，分别提升0.52~9.80个百分点、2.92~23.09个百分点、3.13~16.63个百分点。

关键词: 语音情感识别, 深度学习, 多特征融合, 数据增强, 长短时记忆神经网络, 简化加性注意力机制

Abstract: Aiming at the problems of incomplete representation of speech information by single speech emotion features and low utilization rate of speech features by the model, a multi-feature fusion speech emotion recognition method based on SAA-CNN-BiLSTM network was proposed. In this method, noise, volume and sound speed boosters were introduced to enhance the data, enabling the model to learn diverse data features. Multiple features including fundamental frequency, time domain and frequency domain were fused to comprehensively express emotional information from different perspectives. Based on BiLSTM network, CNN was introduced to capture the spatial correlation of the input data and extract more representative features. A simplified additive attention mechanism was constructed, and the explicit query keys and query vectors were simplified, so that the calculation of attention weights does not depend on specific query information. Features of different dimensions can be correlated and influenced based on the attention weights, and the information between features can be interacted and fused, thus improving the effective utilization rate of features. The experimental results show that this method achieves accuracies of 87.02%, 82.59%, and 73.13% on the EMO-DB, CASIA, and SAVEE datasets respectively. Compared with the baseline methods such as IncConv, NHPC-BiLSTM, and DCRNN, it improves the performance by 0.52-9.80 percentage points, 2.92-23.09 percentage points, and 3.13-16.63 percentage points respectively.

Key words: speech emotion recognition, deep learning, multi-feature fusion, data augmentation, long short-term memory neural network, simplified additive attention mechanism

中图分类号:

TP391

昝志辉王雅静李珂杨智翔杨光宇. 基于SAA-CNN-BiLSTM网络的多特征融合语音情感识别方法[J]. 计算机应用, DOI: 10.11772/j.issn.1001-9081.2025010042.

[1]	田仁杰, 景明利, 焦龙, 王飞. 基于混合负采样的图对比学习推荐算法[J]. 《计算机应用》唯一官方网站, 2025, 45(4): 1053-1060.
[2]	孙海涛, 林佳瑜, 梁祖红, 郭洁. 结合标签混淆的中文文本分类数据增强技术[J]. 《计算机应用》唯一官方网站, 2025, 45(4): 1113-1119.
[3]	潘理虎, 彭守信, 张睿, 薛之洋, 毛旭珍. 面向运动前景区域的视频异常检测[J]. 《计算机应用》唯一官方网站, 2025, 45(4): 1300-1309.
[4]	王一丁, 王泽浩, 李耀利, 蔡少青, 袁媛. 多尺度2D-Adaboost的中药材粉末显微图像识别算法[J]. 《计算机应用》唯一官方网站, 2025, 45(4): 1325-1332.
[5]	周阳, 李辉. 基于语义和细节特征双促进的遥感影像建筑物提取网络[J]. 《计算机应用》唯一官方网站, 2025, 45(4): 1310-1316.
[6]	薛振华, 李强, 黄超. 视觉基础模型驱动的像素级图像异常检测方法[J]. 《计算机应用》唯一官方网站, 2025, 45(3): 823-831.
[7]	盛坤, 王中卿. 基于大语言模型和数据增强的通感隐喻分析[J]. 《计算机应用》唯一官方网站, 2025, 45(3): 794-800.
[8]	陈瑞龙, 胡涛, 卜佑军, 伊鹏, 胡先君, 乔伟. 面向加密恶意流量检测模型的堆叠集成对抗防御方法[J]. 《计算机应用》唯一官方网站, 2025, 45(3): 864-871.
[9]	孙晨伟, 侯俊利, 刘祥根, 吕建成. 面向工程图纸理解的大语言模型提示生成方法[J]. 《计算机应用》唯一官方网站, 2025, 45(3): 801-807.
[10]	洪梓榕, 包广清. 基于集成学习的雷达自动目标识别综述[J]. 《计算机应用》唯一官方网站, 2025, 45(2): 371-382.
[11]	张众维, 王俊, 刘树东, 王志恒. 多尺度特征融合与加权框融合的遥感图像目标检测[J]. 《计算机应用》唯一官方网站, 2025, 45(2): 633-639.
[12]	严雪文, 黄章进. 基于对比学习的小样本图像分类方法[J]. 《计算机应用》唯一官方网站, 2025, 45(2): 383-391.
[13]	邓淼磊, 阚雨培, 孙川川, 徐海航, 樊少珺, 周鑫. 基于深度学习的网络入侵检测系统综述[J]. 《计算机应用》唯一官方网站, 2025, 45(2): 453-466.
[14]	余松森, 林智凡, 薛国鹏, 徐建宇. 基于改进YOLOv8的轻量级大幅面瓷砖缺陷检测算法[J]. 《计算机应用》唯一官方网站, 2025, 45(2): 647-654.
[15]	丁丹妮, 彭博, 吴锡. 受腹侧通路启发的脂肪肝超声图像分类方法VPNet[J]. 《计算机应用》唯一官方网站, 2025, 45(2): 662-669.

基于SAA-CNN-BiLSTM网络的多特征融合语音情感识别方法

Multi-feature fusion speech emotion recognition method based on SAA-CNN-BiLSTM network

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics