《计算机应用》唯一官方网站 ›› 2025, Vol. 45 ›› Issue (4): 1025-1034.DOI: 10.11772/j.issn.1001-9081.2024030319

• 人工智能 •    下一篇

基于语音和文本的双模态情感识别综述

韩令敏, 陈仙红(), 熊文梦   

  1. 北京工业大学 信息科学技术学院,北京 100124
  • 收稿日期:2024-03-21 修回日期:2024-05-20 接受日期:2024-05-27 发布日期:2024-06-18 出版日期:2025-04-10
  • 通讯作者: 陈仙红
  • 作者简介:韩令敏(1999—),女,河北唐山人,硕士研究生,主要研究方向:情感分析、情感识别
    陈仙红(1989—),女,福建龙岩人,副教授,博士,主要研究方向:情感识别、说话人识别、语音识别
    熊文梦(1989—),女,湖北荆州人,讲师,博士,主要研究方向:音频信号处理。
  • 基金资助:
    国家自然科学基金资助项目(62006010);北京市教育委员会科技/社科计划项目(KM202210005029)

Review on bimodal emotion recognition based on speech and text

Lingmin HAN, Xianhong CHEN(), Wenmeng XIONG   

  1. School of Information Science and Technology,Beijing University of Technology,Beijing 100124,China
  • Received:2024-03-21 Revised:2024-05-20 Accepted:2024-05-27 Online:2024-06-18 Published:2025-04-10
  • Contact: Xianhong CHEN
  • About author:HAN Lingmin, born in 1999, M. S. candidate. Her research interests include sentiment analysis, emotion recognition.
    CHEN Xianhong, born in 1989, Ph. D., associate professor. Her research interests include emotion recognition, speaker recognition, speech recognition.
    XIONG Wenmeng, born in 1989, Ph. D., lecturer. Her research interests include audio signal processing.
  • Supported by:
    National Natural Science Foundation of China(62006010);R&D Program of Beijing Municipal Education Commission(KM202210005029)

摘要:

情感识别是一种让计算机识别和理解人类情感的技术,在众多领域都起着重要的作用,也是人工智能领域重要的发展方向。因此,梳理与归纳基于语音和文本的双模态情感识别的研究现状:首先,分类阐述情感表示空间;其次,按照情感数据库的情感表示空间对这些数据库进行分类,并总结常见的多模态情感数据库;再次,介绍基于语音和文本的双模态情感识别方法,包括特征提取、模态融合和决策分类,重点介绍模态融合方法并将这些方法分为特征级融合、决策级融合、模型级融合和多层次融合这4类;此外,比较和分析一系列语音和文本双模态情感识别方法的结果;最后,介绍情感识别的应用场景、面临的挑战与未来的发展方向。以上旨在对多模态情感识别,尤其是对基于语音和文本的双模态情感识别的相关工作进行分析与总结,并为情感识别提供有价值的参考。

关键词: 情感识别, 双模态, 模态融合, 语音, 文本

Abstract:

Emotion recognition is a technology that allows computers to recognize and understand human emotions. It plays an important role in many fields and is an important development direction in the field of artificial intelligence. Therefore, the research status of bimodal emotion recognition based on speech and text was summarized. Firstly, the representation space of emotion was classified and elaborated. Secondly, the emotion databases were classified according to their emotion representation space, and the common multi-modal emotion databases were summed up. Thirdly, the methods of bimodal emotion recognition based on speech and text were introduced, including feature extraction, modal fusion, and decision classification. Specifically, the modal fusion methods were highlighted and divided into four categories, namely feature level fusion, decision level fusion, model level fusion and multi-level fusion. In addition, results of a series of bimodal emotion recognition methods based on speech and text were compared and analyzed. Finally, the application scenarios, challenges, and future development directions of emotion recognition were introduced. The above aims to analyze and review the work of multi-modal emotion recognition, especially bimodal emotion recognition based on speech and text, providing valuable information for emotion recognition.

Key words: emotion recognition, bimodal, modal fusion, speech, text

中图分类号: