《计算机应用》唯一官方网站

• •    下一篇

基于决策边界优化域自适应的跨库语音情感识别

汪洋1,傅洪亮1,陶华伟1,杨静1,谢跃2,赵力3   

  1. 1. 河南工业大学 粮食信息处理与控制教育部重点实验室
    2. 南京工程学院 信息与通信工程学院
    3. 东南大学 信息科学与工程学院
  • 收稿日期:2021-12-06 修回日期:2022-04-27 发布日期:2022-06-13 出版日期:2022-06-13
  • 通讯作者: 陶华伟
  • 作者简介:汪洋(1999-),男,河南信阳人,硕士研究生,CCF学生会员(H0653G),主要研究方向为语音信号处理; 傅洪亮(1965-),男,河南安阳人,教授,硕导,博士,主要研究方向为通信与信息系统; 陶华伟(1987-),男(通信作者),河南郑州人,讲师,博士,CCF会员,主要研究方向为语音情感识别(thw@haut.edu.cn); 杨静(1983-), 女, 河南商丘人, 副教授, 硕导, 博士, 主要研究方向为通信信号处理; 谢跃(1991-),男,博士,主要研究方向为人工智能、情感计算; 赵力(1958-),男,江苏南京人,教授,博导,博士,主要研究方向为语音信号处理、情感信息处理。
  • 基金资助:
    国家自然科学基金资助项目(62001215);河南省教育厅自然科学项目(21A120003);河南工业大学高层次人才启动项目(2018BS037)

Cross-corpus speech emotion recognition based on decision boundary optimized domain adaptation

WANG Yang1, FU Hongliang1, TAO Huawei1*, YANG Jing1, XIE Yue2, ZHAO Li3   

  1. 1. Key Laboratory of Grain Information Processing and Control (Henan University of Technology), Ministry of Education 2. School of Information and Communication Engineering, Nanjing Institute Technology 3. School of Information Science and Engineering, Southeast University
  • Received:2021-12-06 Revised:2022-04-27 Online:2022-06-13 Published:2022-06-13
  • About author:WANG Yang, born in 1999, M. S. candidate. His research interests include speech signal processing. FU Hongliang, born in 1965, Ph. D., professor. His research interests include communication and information systems. TAO Huawei, born in 1987, Ph. D., lecturer. His research interests include speech emotion recognition. YANG Jing, born in 1983, Ph. D., associate professor. Her research interests include communication signal processing. XIE Yue, born in 1991, Ph. D., lecturer. His research interests include artificial intelligence and affective computing. ZHAO Li, born in 1958, Ph. D., professor. His research interests include speech signal processing, emotional information processing.
  • Supported by:
    National Natural Science Foundation of China (62001215), Natural Science Project of Henan Education Department (21A120003), Start-up Fund for High-level Talents of Henan University of Technology (2018BS037).

摘要: 域自适应算法被广泛应用于跨库语音情感识别中。然而,许多域自适应算法在追求减小域差异的同时,丧失了目标域样本的鉴别性,导致其以高密度的形式存在于模型决策边界处,降低了模型的性能。基于此,提出一种基于决策边界优化域自适应(decision boundary optimized domain adaptation, DBODA)的跨库语音情感识别方法。首先利用卷积神经网络进行特征处理,随后将特征送入最大化核范数及均值差异(maximum n-norm and mean discrepancy, MNMD)模块,在减小域间差异的同时,最大化目标域情感预测概率矩阵的核范数,以提升目标域样本的鉴别性,优化决策边界。在以Berlin, eNTERFACE, CASIA语音库为基准库设立的六组跨库实验中,所提方法的平均识别精度领先于其他算法1.28%~11.01%,说明模型有效降低了决策边界的样本密度,提升了预测的准确性。

关键词: 跨库语音情感识别, 卷积神经网络, 决策边界优化, 域自适应, 特征分布差异

Abstract: Domain adaptation algorithms are widely used for cross-corpus speech emotion recognition. However, many domain adaptation algorithms lose the discrimination of target domain samples while pursuing the minimization of domain discrepancy, resulting in their presence at the decision boundary of the model in a dense form, which degrades the performance of the model. Based on the above problem, a decision boundary optimized domain adaptation (DBODA) based cross-corpus speech emotion recognition method is proposed. The features are firstly processed using convolutional neural networks, and then fed into the maximize n-norm and mean discrepancy (MNMD) module to maximize the nuclear-norm of the sentiment prediction probability matrix of the target domain while reducing the inter-domain discrepancy to enhance the discrimination of the target domain samples and optimize the decision boundary. In six sets of cross-corpus experiments set up with Berlin, eNTERFACE, and CASIA speech datasets, the average recognition accuracy of the proposed method is 1.28%~11.01% ahead of other algorithms, indicating that the model effectively reduces the sample density around the decision boundary and improves the prediction accuracy.

Key words: cross-corpus speech emotion recognition, convolutional neural network, decision boundary optimization, domain adaptation, feature distribution discrepancy

中图分类号: