Abductive reasoning model based on attention balance list

doi:10.11772/j.issn.1001-9081.2021122105

Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (2): 349-355.DOI: 10.11772/j.issn.1001-9081.2021122105

Special Issue: 人工智能

• Artificial intelligence • Previous Articles Next Articles

Abductive reasoning model based on attention balance list

Ming XU¹^,²^,³, Linhao LI¹^,²^,³(), Qiaoling QI¹^,²^,³, Liqin WANG¹^,²^,³

^1.School of Artificial Intelligence，Hebei University of Technology，Tianjin 300401，China
^2.Hebei Province Key Laboratory of Big Data Calculation （Hebei University of Technology），Tianjin 300401，China
^3.Hebei Data Driven Industrial Intelligent Engineering Research Center （Hebei University of Technology），Tianjin 300401，China

Received:2021-12-14 Revised:2022-05-03 Accepted:2022-05-13 Online:2023-02-08 Published:2023-02-10
Contact: Linhao LI
About author:XU Ming， born in 1996， M. S. candidate. His research interests include natural language processing， text classification.
QI Qiaoling， born in 1984， Ph. D.，lecturer. His research interests include intelligent information processing.
WANG Liqin， born in 1980， Ph. D.， experimentalist. Her research interests include intelligent information processing， knowledge graph.
Supported by:
National Natural Science Foundation of China(61902106);Hebei Province Higher Education Teaching Reform Research and Practice Project(2020GJJG027)

基于注意力平衡列表的溯因推理模型

徐铭¹^,²^,³, 李林昊¹^,²^,³(), 齐巧玲¹^,²^,³, 王利琴¹^,²^,³

^1.河北工业大学人工智能与数据科学学院，天津 300401
^2.河北省大数据计算重点实验室（河北工业大学），天津 300401
^3.河北省数据驱动工业智能工程研究中心（河北工业大学），天津 300401

通讯作者: 李林昊
作者简介:徐铭（1996—），男，山东滕州人，硕士研究生，主要研究方向：自然语言处理、文本分类
齐巧玲（1984—），女，河北晋州人，讲师，博士，主要研究方向：智能信息处理
王利琴（1980—），女，河北张北人，实验师，博士，CCF会员，主要研究方向：智能信息处理、知识图谱。
基金资助:
国家自然科学基金资助项目(61902106);河北省高等教育教学改革研究与实践项目(2020GJJG027)

Abstract

Abstract:

Abductive reasoning is an important task in Natural Language Inference （NLI）， which aims to infer reasonable process events （hypotheses） between the given initial observation event and final observation event. Earlier studies independently trained the inference model from each training sample； recently， mainstream studies have considered the semantic correlation between similar training samples and fitted the reasonableness of the hypotheses with the frequency of these hypotheses in the training set， so as to describe the reasonableness of the hypotheses in different environments more accurately. On this basis， while describing the reasonableness of the hypotheses， the difference and relativity constraints between reasonable hypotheses and unreasonable hypotheses were added， thereby achieving the purpose of two-way characterization of the reasonableness and unreasonableness of the hypotheses， and the overall relativity was modeled through many-to-many training. In addition， considering the difference of the word importance in the process of event expression， an attention module was constructed for different words in the samples. Finally， an abductive reasoning model based on attention balance list was formed. Experimental results show that compared with the L2R² （Learning to Rank for Reasoning） model， the proposed model has the accuracy and AUC improved by about 0.46 and 1.36 percentage points respectively on the mainstream abductive inference dataset Abductive Reasoning in narrative Text （ART）， which prove the effectiveness of the proposed model.

Key words: Natural Language Processing (NLP), abductive reasoning, pre-trained model, Bidirectional Encoder Representations from Transformers (BERT), attention mechanism

摘要：

溯因推理是自然语言推理（NLI）中的重要任务，旨在通过给定的起始观测事件和最终观测事件，推断出二者之间合理的过程事件（假设）。早期的研究从每条训练样本中独立训练推理模型；而最近，主流的研究考虑了相似训练样本间的语义关联性，并以训练集中假设出现的频次拟合其合理程度，从而更精准地刻画假设在不同环境中的合理性。在此基础上，在刻画假设的合理性的同时，加入了合理假设与不合理假设的差异性和相对性约束，从而达到了假设的合理性和不合理性的双向刻画目的，并通过多对多的训练方式实现了整体相对性建模；此外，考虑到事件表达过程中单词重要性的差异，构造了对样本不同单词的关注模块，最终形成了基于注意力平衡列表的溯因推理模型。实验结果表明，与L2R²模型相比，所提模型在溯因推理主流数据集叙事文本中的溯因推理（ART）上的准确率和AUC分别提高了约0.46和1.36个百分点，证明了所提模型的有效性。

关键词: 自然语言处理, 溯因推理, 预训练模型, BERT, 注意力机制

CLC Number:

TP18

Ming XU, Linhao LI, Qiaoling QI, Liqin WANG. Abductive reasoning model based on attention balance list[J]. Journal of Computer Applications, 2023, 43(2): 349-355.

徐铭, 李林昊, 齐巧玲, 王利琴. 基于注意力平衡列表的溯因推理模型[J]. 《计算机应用》唯一官方网站, 2023, 43(2): 349-355.

Figures/Tables 10

Fig. 1 Comparison of L2R2 and ABL models

Tab. 1 Comparison of NLI and aNLI tasks

任务	内容	答案
NLI	P：一名男子正在检查一个人的衣服 H：这个男人在睡觉	E，N或C
	P：一个老人和年轻人正在笑 H：两个男人看着在地板上玩耍的猫	E，N或C
	P：一场有多个男性参与的足球比赛 H：一些男人正在进行一项运动	E，N或C
aNLI	O₁：多蒂心情很糟糕 H¹：多蒂吃坏了东西 H²：打电话给几个好朋友聊天 O₂：之后她感觉好多了	H¹或 H²

Fig. 2 Attention balance list model structure

Tab. 2 Description of vectors in HInput

SInput	［CLS］； O₁；［SEP］； H^j；［SEP］； O₂
标记词嵌入	将每个单词与占位符进行嵌入
片段词嵌入	将观测和假设分别进行嵌入
位置词嵌入	将单词的位置信息进行嵌入

Tab. 3 Statistics of ART dataset

总体描述	内容	训练集	验证集	测试集
唯一事件总数	背景 $O 1, O 2$	17 801	1 532	3 059
	合理的假设h⁺	72 046	1 532	3 059
	不合理假设h^-	166 820	1 532	3 059
事件中平均假设的数量	合理的假设h⁺	4.05	1.00	1.00
事件中平均假设的数量	不合理假设h^-	9.37	1.00	1.00
平均包含多少单词	不合理假设h^-	8.34	8.26	8.54
	不合理假设h^-	8.28	8.55	8.53
	第一个观测 O₁	8.09	8.07	8.17
	第二个观测 O₂	9.29	9.30	9.31

Tab. 3 Statistics of ART dataset

总体描述	内容	训练集	验证集	测试集
唯一事件总数	背景 $O 1, O 2$	17 801	1 532	3 059
	合理的假设h⁺	72 046	1 532	3 059
	不合理假设h^-	166 820	1 532	3 059
事件中平均假设的数量	合理的假设h⁺	4.05	1.00	1.00
事件中平均假设的数量	不合理假设h^-	9.37	1.00	1.00
平均包含多少单词	不合理假设h^-	8.34	8.26	8.54
	不合理假设h^-	8.28	8.55	8.53
	第一个观测 O₁	8.09	8.07	8.17
	第二个观测 O₂	9.29	9.30	9.31

Tab. 4 Experimental results of different models

模型	验证集		测试集
模型	Acc	AUC	Acc
Human Perf^［3］	—	—	91.40
Majority^［3］	50.80	—	—
GPT	62.70	—	62.30
BERT-Large	69.10	69.03	68.90
RoBERTa-Large	85.76	85.02	84.48
L2R²	88.44	87.53	86.81
MHKA	87.85	—	—
ABL	88.90	88.89	86.84

Fig. 3 Comparison of balanced list and other models on Acc

Fig. 4 Comparison of balanced list and other models on AUC

Tab. 5 Results of adding attention mechanism on Acc

模型	训练集比例/%
模型	1	2	5	10	100
平衡列表	78.33	79.96	82.51	84.33	88.51
平衡列表+注意力机制	79.24	80.81	83.03	84.92	88.90

Tab. 6 Results of adding attention mechanism on AUC

模型	训练集比例/%
模型	1	2	5	10	100
平衡列表	75.08	77.03	80.55	83.00	88.19
平衡列表+注意力机制	76.53	78.20	81.81	83.65	88.89

References 30

1	MINSKY M. Deep issues： commonsense-based interfaces［J］. Communications of the ACM， 2000， 43（8）： 66-73. 10.1145/345124.345145
2	DAVIS E， MARCUS G. Commonsense reasoning and commonsense knowledge in artificial intelligence［J］. Communications of the ACM， 2015， 58（9）： 92-103. 10.1145/2701413
3	BHAGAVATULA C， LE BRAS R， MALAVIYA C， et al. Abductive commonsense reasoning［EB/OL］. （2020-02-14）［2021-12-10］..
4	BOWMAN S R， ANGELI G， POTTS C， et al. A large annotated corpus for learning natural language inference［C］// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Stroudsburg， PA： ACL， 2015： 632-642. 10.18653/v1/d15-1075
5	WILLIAMS A， NANGIA N， BOWMAN S R. A broad-coverage challenge corpus for sentence understanding through inference［C］// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies， Volume 1 （Long Papers）. Stroudsburg， PA： ACL， 2018： 1112-1122. 10.18653/v1/n18-1101
6	MacCARTNEY B， MANNING C D. Natural logic for textual inference［C］// Proceedings of the 2007 ACL-PASCAL Workshop on Textual Entailment and Paraphrasing. Stroudsburg， PA： ACL， 2007： 193-200. 10.3115/1654536.1654575
7	ZELLERS R， BISK Y， SCHWARTZ R， et al. SWAG： a large-scale adversarial dataset for grounded commonsense inference［C］// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg， PA： ACL， 2018： 93-104. 10.18653/v1/d18-1009
8	ZHU Y C， PANG L， LAN Y Y， et al. L2R2： leveraging ranking for abductive reasoning［C］// Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York： ACM， 2020： 1961-1964. 10.1145/3397271.3401332
9	YU C L， ZHANG H M， SONG Y Q， et al. Enriching large-scale eventuality knowledge graph with entailment relations［C/OL］// Proceedings of the 2020 Conference on Automated Knowledge Base Construction. ［2021-12-10］.. 10.1145/3366423.3380107
10	BAUER L， BANSAL M. Identify， align， and integrate： matching knowledge graphs to commonsense reasoning tasks［C］// Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics. Stroudsburg， PA： ACL， 2021：2259-2272. 10.18653/v1/2021.eacl-main.192
11	MA K X， ILIEVSKI F， FRANCIS J， et al. Knowledge-driven data construction for zero-shot evaluation in commonsense question answering［C］// Proceedings of the 35th AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2021： 13507-13515. 10.1609/aaai.v35i15.17593
12	HUANG Y C， ZHANG Y Z， ELACHQAR O， et al. INSET： sentence infilling with INter-SEntential Transformer［C］// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg， PA： ACL， 2020： 2502-2515. 10.18653/v1/2020.acl-main.226
13	ZHOU W C S， LEE D H， SELVAM R K， et al. Pre-training text-to-text transformers for concept-centric common sense［EB/OL］. （2022-02-10）［2022-11-10］..
14	YU C W， CHEN J W， CHEN Y L. Enhanced LSTM framework for water-cooled chiller COP forecasting［C］// Proceedings of the 2021 IEEE International Conference on Consumer Electronics. Piscataway： IEEE， 2021： 1-3. 10.1109/icce50685.2021.9427706
15	岳增营，叶霞，刘睿珩. 基于语言模型的预训练技术研究综述［J］. 中文信息学报， 2021， 35（9）：15-29. 10.3969/j.issn.1003-0077.2021.09.002
	YUE Z Y， YE X， LIU R H. A survey of language model based pre-training technology［J］. Journal of Chinese Information Processing， 2021， 35（9）：15-29. 10.3969/j.issn.1003-0077.2021.09.002
16	DEVLIN J， CHANG M W， LEE K， et al. BERT： pre-training of deep bidirectional transformers for language understanding［C］// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies， Volume 1 （Long and Short Papers）. Stroudsburg， PA： ACL， 2019：4171-4186. 10.18653/v1/n18-2
17	LIU Y H， OTT M， GOYAL N， et al. RoBERTa： a robustly optimized BERT pretraining approach［EB/OL］. （2019-07-26）［2021-12-10］..
18	CHEN Q， ZHU X D， LING Z H， et al. Enhanced LSTM for natural language inference［C］// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics （Volume 1： Long Papers）. Stroudsburg， PA： ACL， 2017： 1657-1668. 10.18653/v1/p17-1152
19	HERBRICH R， GRAEPEL T， OBERMAYER K. Large margin rank boundaries for ordinal regression［M］// SMOLA A J， BARTLETT P L， SCHÖLKOPF B， et al. Advances in Large Margin Classifiers. Cambridge： MIT Press， 2000：115-132. 10.7551/mitpress/1113.003.0010
20	BURGES C， SHAKED T， RENSHAW E， et al. Learning to rank using gradient descent［C］// Proceedings of the 22nd International Conference on Machine Learning. New York： ACM， 2005： 89-96. 10.1145/1102351.1102363
21	BURGES C J C， RAGNO R， LE Q V. Learning to rank with nonsmooth cost functions［C］// Proceedings of the 19th International Conference on Neural Information Processing Systems. Cambridge： MIT Press， 2006： 193-200. 10.7551/mitpress/7503.003.0029
22	CAO Z， QIN T， LIU T Y， et al. Learning to rank： from pairwise approach to listwise approach［C］// Proceedings of the 24th International Conference on Machine Learning. New York： ACM， 2007： 129-136. 10.1145/1273496.1273513
23	LI M H， LIU X L， J van de WEIJER， et al. Learning to rank for active learning： a listwise approach［C］// Proceedings of the 25th International Conference on Pattern Recognition. Piscataway： IEEE， 2021： 5587-5594. 10.1109/icpr48806.2021.9412680
24	QIN T， LIU T Y， LI H. A general approximation framework for direct optimization of information retrieval measures［J］. Information Retrieval， 2010， 13（4）：375-397. 10.1007/s10791-009-9124-x
25	PAUL D， FRANK A. Social commonsense reasoning with multi-head knowledge attention［C］// Findings of the Association for Computational Linguistics： EMNLP 2020. Stroudsburg， PA： ACL， 2020： 2969-2980. 10.18653/v1/2020.findings-emnlp.267
26	MIKOLOV T， CHEN K， CORRADO G， et al. Efficient estimation of word representations in vector space［EB/OL］. （2013-09-07）［2021-12-10］.. 10.3126/jiee.v3i1.34327
27	PENNINGTON J， SOCHER R， MANNING C D. GloVe： global vectors for word representation［C］// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Scottsdale： Workshop Track Proceedings. Stroudsburg， PA： ACL， 2014： 1532-1543. 10.3115/v1/d14-1162
28	PETERS M E， NEUMANN M， IYYER M， et al. Deep contextualized word representations［C］// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies， Volume 1. Stroudsburg， PA： ACL， 2018： 2227-2237. 10.18653/v1/n18-1202
29	HE P C， LIU X D， GAO J F， et al. DeBERTa： decoding-enhanced BERT with disentangled attention［EB/OL］. （2021-10-06）［2021-12-10］..
30	LI W， GAO C， NIU G C， et al. UNIMO： towards unified-modal understanding and generation via cross-modal contrastive learning［C］// Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. Stroudsburg， PA： ACL， 2021： 2592-2607. 10.18653/v1/2021.acl-long.202

[1]	Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974.
[2]	Liting LI, Bei HUA, Ruozhou HE, Kuang XU. Multivariate time series prediction model based on decoupled attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2732-2738.
[3]	Qi SHUAI, Hairui WANG, Guifu ZHU. Chinese story ending generation model based on bidirectional contrastive training [J]. Journal of Computer Applications, 2024, 44(9): 2683-2688.
[4]	Zhiqiang ZHAO, Peihong MA, Xinhong HEI. Crowd counting method based on dual attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2886-2892.
[5]	Kaipeng XUE, Tao XU, Chunjie LIAO. Multimodal sentiment analysis network with self-supervision and multi-layer cross attention [J]. Journal of Computer Applications, 2024, 44(8): 2387-2392.
[6]	Pengqi GAO, Heming HUANG, Yonghong FAN. Fusion of coordinate and multi-head attention mechanisms for interactive speech emotion recognition [J]. Journal of Computer Applications, 2024, 44(8): 2400-2406.
[7]	Chenyang LI, Long ZHANG, Qiusheng ZHENG, Shaohua QIAN. Multivariate controllable text generation based on diffusion sequences [J]. Journal of Computer Applications, 2024, 44(8): 2414-2420.
[8]	Quanmei ZHANG, Runping HUANG, Fei TENG, Haibo ZHANG, Nan ZHOU. Automatic international classification of disease coding method incorporating heterogeneous information [J]. Journal of Computer Applications, 2024, 44(8): 2476-2482.
[9]	Zhonghua LI, Yunqi BAI, Xuejin WANG, Leilei HUANG, Chujun LIN, Shiyu LIAO. Low illumination face detection based on image enhancement [J]. Journal of Computer Applications, 2024, 44(8): 2588-2594.
[10]	Shangbin MO, Wenjun WANG, Ling DONG, Shengxiang GAO, Zhengtao YU. Single-channel speech enhancement based on multi-channel information aggregation and collaborative decoding [J]. Journal of Computer Applications, 2024, 44(8): 2611-2617.
[11]	Li LIU, Haijin HOU, Anhong WANG, Tao ZHANG. Generative data hiding algorithm based on multi-scale attention [J]. Journal of Computer Applications, 2024, 44(7): 2102-2109.
[12]	Song XU, Wenbo ZHANG, Yifan WANG. Lightweight video salient object detection network based on spatiotemporal information [J]. Journal of Computer Applications, 2024, 44(7): 2192-2199.
[13]	Dahai LI, Zhonghua WANG, Zhendong WANG. Dual-branch low-light image enhancement network combining spatial and frequency domain information [J]. Journal of Computer Applications, 2024, 44(7): 2175-2182.
[14]	Wenliang WEI, Yangping WANG, Biao YUE, Anzheng WANG, Zhe ZHANG. Deep learning model for infrared and visible image fusion based on illumination weight allocation and attention [J]. Journal of Computer Applications, 2024, 44(7): 2183-2191.
[15]	Wu XIONG, Congjun CAO, Xuefang SONG, Yunlong SHAO, Xusheng WANG. Handwriting identification method based on multi-scale mixed domain attention mechanism [J]. Journal of Computer Applications, 2024, 44(7): 2225-2232.

Abductive reasoning model based on attention balance list

基于注意力平衡列表的溯因推理模型

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 10

References 30

Related Articles 15

Recommended Articles

Metrics