基于注意力机制的多特征融合对话行为层次化分类方法

doi:10.11772/j.issn.1001-9081.2023030358

《计算机应用》唯一官方网站 ›› 2024, Vol. 44 ›› Issue (3): 715-721.DOI: 10.11772/j.issn.1001-9081.2023030358

基于注意力机制的多特征融合对话行为层次化分类方法

贾宗泽, 高鹏飞, 马应龙(), 刘晓峰, 夏海鑫

华北电力大学控制与计算机工程学院，北京 102206

收稿日期:2023-04-04 修回日期:2023-06-06 接受日期:2023-06-08 发布日期:2023-07-04 出版日期:2024-03-10
通讯作者: 马应龙
作者简介:贾宗泽（1997—），男，山西运城人，硕士研究生，主要研究方向：自然语言处理
高鹏飞（1996—），男，山东济宁人，硕士研究生，主要研究方向：机器学习
刘晓峰（1994—），男，山西大同人，博士研究生，主要研究方向：自然语言处理
夏海鑫（1997—），男，河北张家口人，硕士研究生，主要研究方向：知识图谱。
基金资助:
国家电网科技部项目(SGGSXT00XMJS2250023)

Multi-feature fusion attention-based hierarchical classification method for dialogue act

Zongze JIA, Pengfei GAO, Yinglong MA(), Xiaofeng LIU, Haixin XIA

School of Control and Computer Engineering，North China Electric Power University，Beijing 102206，China

Received:2023-04-04 Revised:2023-06-06 Accepted:2023-06-08 Online:2023-07-04 Published:2024-03-10
Contact: Yinglong MA
About author:JIA Zongze， born in 1997， M. S. candidate. His research interests include natural language processing.
GAO Pengfei， born in 1996， M. S. candidate. His research interests include machine learning.
LIU Xiaofeng， born in 1994， Ph. D. candidate. His research interests include natural language processing.
XIA Haixin， born in 1997， M. S. candidate. His research interests include knowledge graph.
Supported by:
Project of Science and Technology Department of State Grid(SGGSXT00XMJS2250023)

摘要/Abstract

摘要：

目前深度学习模型在对话行为识别中被广泛采用，通过挖掘多种对话行为特征以提升对话行为分类性能。然而，这些方法忽视了不同对话行为特征之间的潜在关联和相互影响，且对话行为分类过程中也很少考虑对话行为标签之间的语义关联关系，这些都妨碍了对话行为识别的性能提升。针对以上问题，提出一种基于注意力机制的多特征融合层次化分类（MFA-HC）方法用于对话行为识别。首先，提出一种基于无遗忘学习的对话行为层次化分类框架，结合词、词性以及相关语言学统计量等多种细粒度特征来学习训练对话行为分类模型；其次，提出一种基于注意力机制的共性-个性模型捕获不同特征之间的共性和个性特征。在两个基准数据集SwDA（Switchboard Dialogue Act corpus）和MRDA（ICSI Meeting Recorder Dialogue Act corpus）上的实验结果表明：相较于目前整体性能较优的DARER（Dual-tAsk temporal Relational rEcurrent Reasoning network），MFA-HC方法通过捕捉话语中隐含的共性和个性特征，分类准确率分别提高了0.6%和0.1%。

关键词: 对话行为, 特征表示, 特征融合, 多特征, 层次分类

Abstract:

Nowadays， deep learning models have been widely applied in dialogue act recognition， which can improve classification performance by mining various features of dialogue acts. However， the existing methods neglect the latent association and interaction between different features of dialogue acts and also seldom consider the semantic relevance between labels of dialogue act in the classification process， which hinders from improving the performance of dialogue act recognition. To solve these problems， an MFA-HC （Multi-feature Fusion Attention-based Hierarchical Classification） method for recognizing dialogue act was proposed. Firstly， a hierarchical dialogue act classification framework based on learning without forgetting was proposed， which combined various fine-grained features such as words， parts of speech and relevant linguistic statistics to learn and train the dialogue act classification model. Secondly， a universality-individuality model based on attention mechanism was proposed to capture the universality and individuality features among different features. Experimental results on two benchmark datasets SwDA （Switchboard Dialogue Act corpus） and MRDA （ICSI Meeting Recorder Dialogue Act corpus） show that， compared with DARER （Dual-tAsk temporal Relational rEcurrent Reasoning network）， which has the current overall superior performance in existing methods， MFA-HC method improves the classification accuracy by 0.6% and 0.1% by capturing the universality and individuality features hidden in the utterance.

Key words: dialogue act, feature representation, feature fusion, multi-feature, hierarchical classification

中图分类号:

TP391.1

贾宗泽, 高鹏飞, 马应龙, 刘晓峰, 夏海鑫. 基于注意力机制的多特征融合对话行为层次化分类方法[J]. 计算机应用, 2024, 44(3): 715-721.

Zongze JIA, Pengfei GAO, Yinglong MA, Xiaofeng LIU, Haixin XIA. Multi-feature fusion attention-based hierarchical classification method for dialogue act[J]. Journal of Computer Applications, 2024, 44(3): 715-721.

图/表 8

图1 对话行为类别标签层次结构示例

Fig. 1 Example of hierarchical structure of DA category labels

图2 第l-1层与第l层的分类模型结构

Fig. 2 Classification model architectures of level l-1 and level l

图3 UIM架构

Fig. 3 UIM architecture

表1 实验数据集信息

Tab. 1 Information of experiment datasets

数据集	$\| C \|$	$\| V \|$ /10³	训练集		验证集		测试集
数据集	$\| C \|$	$\| V \|$ /10³	对话数	话语数/10³	对话数	话语数/10³	对话数	话语数/10³
MRDA	5	10	51	76	11	15	11	15
SwDA	42	19	1 003	173	112	22	19	4

表1 实验数据集信息

Tab. 1 Information of experiment datasets

数据集	$\| C \|$	$\| V \|$ /10³	训练集		验证集		测试集
数据集	$\| C \|$	$\| V \|$ /10³	对话数	话语数/10³	对话数	话语数/10³	对话数	话语数/10³
MRDA	5	10	51	76	11	15	11	15
SwDA	42	19	1 003	173	112	22	19	4

表2 不同模型在SwDA和MRDA上的结果比较 (%)

Tab. 2 Result comparison of different models on SwDA and MRDA

类型	模型	MRDA		SwDA
类型	模型	Acc	F₁	Acc	F₁
扁平分类器	CNN-prosody	84.7	79.3	75.1	70.6
	STM	91.4	87.1	83.2	79.1
	SPARTA	90.2	85.9	80.1	76.2
	MDOM	91.9	87.6	81.6	77.8
	DARER	93.2	88.6	83.9	79.2
层次分类器	Bi-LSTM-CRF	90.9	85.6	79.2	74.3
	NSIM	89.9	85.1	80.5	76.1
	BiRNN-attention	91.1	87.9	82.9	79.4
	Dual-attention	92.2	88.1	82.3	78.6
	HLSN	90.5	86.3	82.9	76.9
	UIIM	89.9	85.7	78.6	74.8
本文方法	MFA-HC	93.3	88.5	84.4	79.5

表3 MFA-HC和其他层次分类器在SwDA上的准确率 (%)

Tab. 3 Accuracies of MFA-HC and other hierarchical classification models on SwDA

类型	模型	第1层	第2层	第3层
层次分类器	Bi-LSTM-CRF	90.9	86.1	79.2
	NSIM	92.1	87.4	80.5
	BiRNN-attention	94.4	89.3	82.9
	Dual-attention	94.7	89.1	82.3
	HLSN	93.9	88.8	81.9
本文方法	MFA-HC	94.6	90.1	84.4

表4 在SwDA和MRDA上话语长度分类不同间隔的准确率对比

Tab. 4 Accuracy comparison with different intervals of utterance length classification on SwDA and MRDA

间隔大小	不同数据集下的Acc/%
间隔大小	MRDA	SwDA
2	91.6	81.9
3	92.2	83.1
4	92.9	83.9
5	93.3	84.4
6	92.7	83.7
7	91.4	82.6

表5 不同测试集的消融实验结果（准确率） (%)

Tab. 5 Results of ablation study on different test sets （Accuarcy）

类型	模型	MRDA	SwDA
特征	w/o词性特征	90.9	82.6
	w/o语言学统计特征	92.2	83.9
	w/o词性特征&语言学统计特征	88.3	81.2
融合	w/o共性	91.0	82.0
	w/o个性	90.7	82.1
	w/o UIM	88.2	80.3
MFA-HC		93.3	84.4

参考文献 31

1	MEZZA S， WOBCKE W， BLAIR A. A multi-dimensional， cross-domain and hierarchy-aware neural architecture for ISO-standard dialogue act tagging ［C］// Proceedings of the 29th International Conference on Computational Linguistics. Stroudsburg： ACL， 2022： 542-552.
2	ORTEGA D， VU N T. Lexico-acoustic neural-based models for dialog act classification［C］// Proceedings of the 2018 IEEE International Conference on Acoustics， Speech and Signal Processing. Piscataway： IEEE， 2018： 6194-6198. 10.1109/icassp.2018.8461371
3	SAHA T， PATRA A P， SAHA S， et al. Meta-learning based deferred optimisation for sentiment and emotion aware multi-modal dialogue act classification ［C］// Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing. Stroudsburg： ACL， 2022： 978-990.
4	LEE J Y， DERNONCOURT F. Sequential short-text classification with recurrent and convolutional neural networks ［C］// Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies. Stroudsburg： ACL， 2016： 515-520. 10.18653/v1/n16-1062
5	ORTEGA D， VU N T. Neural-based context representation learning for dialog act classification ［C］// Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue. Stroudsburg： ACL， 2017： 247-252. 10.18653/v1/w17-5530
6	FERSCHKE O， GUREVYCH I， CHEBOTAR Y. Behind the article： recognizing dialog acts in Wikipedia talk pages［C］// Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics. Stroudsburg： ACL， 2012： 777-786.
7	KALCHBRENNER N， BLUNSOM P. Recurrent convolutional neural networks for discourse compositionality ［C］// Proceedings of the 2013 Workshop on Continuous Vector Space Models and Their Compositionality. Stroudsburg： ACL， 2013：119-126. 10.3115/v1/w14-15
8	HE Z， TAVABI L， LERMAN K， et al. Speaker turn modeling for dialogue act classification ［C］// Proceedings of the 7th Conference on Empirical Methodsin Natural Language Processing. Stroudsburg： ACL， 2021：2150-2157. 10.18653/v1/2021.findings-emnlp.185
9	MALHOTRA G， WAHEED A， SRIVASTAVA A， et al. Speaker and time-aware joint contextual learning for dialogue-act classification in counselling conversations ［C］// Proceedings of the 15th ACM International Conference on Web Search and Data Mining. New York： ACM， 2022： 735-745. 10.1145/3488560.3498509
10	XING B， TSANG I. DARER： dual-task temporal relational recurrent reasoning network for joint dialog sentiment classification and act recognition ［C］// Proceedings of the 2022 Findings of the Association for Computational Linguistics： ACL 2022. Stroudsburg： ACL， 2022： 3611-3621. 10.18653/v1/2022.findings-acl.286
11	SILLA C N，Jr， FREITAS A A. A survey of hierarchical classification across different application domains ［J］. Data Mining and Knowledge Discovery， 2011， 22： 31-72. 10.1007/s10618-010-0175-9
12	BIFIS A， TRIGKA M， DEDEGKIKA S， et al. A hierarchical ontology for dialogue acts in psychiatric interviews ［C］// Proceedings of the 14th Pervasive Technologies Related to Assistive Environments Conference. New York： ACM， 2021：330-337. 10.1145/3453892.3461349
13	WANG D， LI Z， SHENG D， et al. Balance the labels： hierarchical label structured network for dialogue act recognition［C］// Proceedings of the 2021 International Joint Conference on Neural Networks. Piscataway： IEEE， 2021： 1-8. 10.1109/ijcnn52387.2021.9534022
14	GAO P， MA Y. A universality-individuality integration model for dialog act classification ［EB/OL］. ［2023-04-28］. .
15	LI J， GUO H， CHEN S， et al. A novel semantic inference model with a hierarchical act labels embedded for dialogue act recognition［J］. IEEE Access， 2019，7：167401-167408. 10.1109/access.2019.2944218
16	KUMAR H， AGARWAL A， DASGUPTA R， et al. Dialogue act sequence labeling using hierarchical encoder with CRF ［C］// Proceedings of the 32nd AAAI Conference on Artificial Intelligence. Palo Atlo： AAAI Press， 2018：3440-3447. 10.1609/aaai.v32i1.11701
17	BOTHE C， WERMTER S. Conversational analysis of daily dialog data using polite emotional dialogue acts ［EB/OL］. ［2022-10-11］. . 10.21437/interspeech.2018-2527
18	ZHAO J， LI Y， DU W， et al. FlowEval： a consensus-based dialogue evaluation framework using segment act flows ［C］// Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Stroudsburg： ACL， 2022： 10469-10483. 10.18653/v1/2022.emnlp-main.715
19	VIELSTED M， WALLENIUS N， VAN DER GOOT R. Increasing robustness for cross-domain dialogue act classification on social media data［C］// Proceedings of the 8th Workshop on Noisy User-generated Text. Stroudsburg： ACL， 2022： 180-193.
20	CHAO C H， HOU X J， CHIU Y C. Improve chit-chat and QA sentence classification in user messages of dialogue system using dialogue act embedding［C］// Proceedings of the 33rd Conference on Computational Linguistics and Speech Processing. Stroudsburg： ACL， 2021： 138-143.
21	RAHEJA V， TETREAULT J. Dialogue act classification with context-aware self-attention ［C］// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies， Volume 1 （Long and Short Papers）. Stroudsburg： ACL， 2019： 3727-3733. 10.18653/v1/n18-2
22	LI R， LIN C， COLLINSON M. A dual-attention hierarchical recurrent neural network for dialogue act classification［C］// Proceedings of the 23rd Conference on Computational Natural Language Learning. Stroudsburg： ACL， 2019： 383-392. 10.18653/v1/k19-1036
23	VENS C， STRUYF J， SCHIETGAT L， et al. Decision trees for hierarchical multi-label classification［J］. Machine Learning， 2008，73： 185-214. 10.1007/s10994-008-5077-3
24	BARROS R C， CERRI R， FREITAS A A， et al. Probabilistic clustering for hierarchical multi-label classification of protein functions ［C］// Proceedings of the 2013 Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Cham： Springer， 2013： 385-400. 10.1007/978-3-642-40991-2_25
25	RIBEIRO E， RIBEIRO R， DE MATOS D M. Hierarchical multi-label dialog act recognition on spanish data ［EB/OL］. ［2022-10-22］. . 10.21437/iberspeech.2018-63
26	ZHANG Y， REN P， DE RIJKE M. A taxonomy， data set， and benchmark for detecting and classifying malevolent dialogue responses ［J］. Journal of the Association for Information Science and Technology， 2021， 72（12）： 1477-1497. 10.1002/asi.24496
27	LAFFERTY J D， McCALLUM A， PEREIRA F C N. Conditional random fields： probabilistic models for segmenting and labeling sequence data ［C］// Proceedings of the 18th International Conference on Machine Learning. New York： ACM， 2001： 282-289.
28	FIRDAUS M， GOLCHHA H， EKBAL A， et al. A deep multi-task model for dialogue act classification， intent detection and slot filling ［J］. Cognitive Computation， 2021， 13： 626-645. 10.1007/s12559-020-09718-4
29	NANDANWAR L， SHIVAKUMARA P， MONDAL P， et al. Forged text detection in video， scene， and document images ［J］. IET Image Processing， 2020， 14： 4744-4755. 10.1049/iet-ipr.2020.0590
30	CAPUANO N， CABALLÉ S， CONESA J， et al. Attention-based hierarchical recurrent neural networks for MOOC forum posts analysis ［J］. Journal of Ambient Intelligence and Humanized Computing， 2021， 12： 9977-9989. 10.1007/s12652-020-02747-9
31	LI Z， HOIEM D. Learning without forgetting ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2018， 40（12）： 2935-2947. 10.1109/tpami.2017.2773081

[1]	李新叶, 侯晔凝, 孔英会, 燕志旗. 结合特征融合与增强注意力的少样本目标检测[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 745-751.
[2]	蒋占军, 吴佰靖, 马龙, 廉敬. 多尺度特征和极化自注意力的Faster-RCNN水漂垃圾识别[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 938-944.
[3]	吴宁, 罗杨洋, 许华杰. 基于多尺度特征融合的遥感图像语义分割方法[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 737-744.
[4]	郑宇亮, 陈云华, 白伟杰, 陈平华. 融合事件数据和图像帧的车辆目标检测[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 931-937.
[5]	黄子麒, 胡建鹏. 实体类别增强的汽车领域嵌套命名实体识别[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 377-384.
[6]	崔晨辉, 蔺素珍, 李大威, 禄晓飞, 武杰. 基于孪生网络和Transformer的红外弱小目标跟踪方法[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 563-571.
[7]	黄巧玲, 郑伯川, 丁梓成, 吴泽东. 融合监督注意力模块和跨阶段特征融合的图像修复改进网络[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 572-579.
[8]	朱志平, 杨燕, 王杰. 基于场景图感知的跨模态图像描述模型[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 58-64.
[9]	杨昊, 张轶. 基于上下文信息和多尺度融合重要性感知的特征金字塔网络算法[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2727-2734.
[10]	梁美佳, 刘昕武, 胡晓鹏. 基于改进YOLOv3的列车运行环境图像小目标检测算法[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2611-2618.
[11]	李豆豆, 李汪根, 夏义春, 束阳, 高坤. 基于特征交互与自适应融合的骨骼动作识别[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2581-2587.
[12]	徐则林, 杨敏, 陈勐. 融合空间和文本信息的兴趣点类别表征模型[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2456-2461.
[13]	刘欢, 吴亮红, 张侣, 陈亮, 周博文, 张红强. 基于特征双融合CenterNet的白细胞检测方法[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2602-2610.
[14]	郑帅, 张晓龙, 邓鹤, 任宏伟. 基于多尺度特征融合和网格注意力机制的三维肝脏影像分割方法[J]. 《计算机应用》唯一官方网站, 2023, 43(7): 2303-2310.
[15]	吕学强, 张煜楠, 韩晶, 崔运鹏, 李欢. 融合边特征与注意力的表格结构识别模型[J]. 《计算机应用》唯一官方网站, 2023, 43(3): 752-758.

基于注意力机制的多特征融合对话行为层次化分类方法

Multi-feature fusion attention-based hierarchical classification method for dialogue act

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 8

参考文献 31

相关文章 15

编辑推荐

Metrics