《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (2): 349-355.DOI: 10.11772/j.issn.1001-9081.2021122105
所属专题: 人工智能
徐铭1,2,3, 李林昊1,2,3(), 齐巧玲1,2,3, 王利琴1,2,3
收稿日期:
2021-12-14
修回日期:
2022-05-03
接受日期:
2022-05-13
发布日期:
2023-02-08
出版日期:
2023-02-10
通讯作者:
李林昊
作者简介:
徐铭(1996—),男,山东滕州人,硕士研究生,主要研究方向:自然语言处理、文本分类基金资助:
Ming XU1,2,3, Linhao LI1,2,3(), Qiaoling QI1,2,3, Liqin WANG1,2,3
Received:
2021-12-14
Revised:
2022-05-03
Accepted:
2022-05-13
Online:
2023-02-08
Published:
2023-02-10
Contact:
Linhao LI
About author:
XU Ming, born in 1996, M. S. candidate. His research interests include natural language processing, text classification.Supported by:
摘要:
溯因推理是自然语言推理(NLI)中的重要任务,旨在通过给定的起始观测事件和最终观测事件,推断出二者之间合理的过程事件(假设)。早期的研究从每条训练样本中独立训练推理模型;而最近,主流的研究考虑了相似训练样本间的语义关联性,并以训练集中假设出现的频次拟合其合理程度,从而更精准地刻画假设在不同环境中的合理性。在此基础上,在刻画假设的合理性的同时,加入了合理假设与不合理假设的差异性和相对性约束,从而达到了假设的合理性和不合理性的双向刻画目的,并通过多对多的训练方式实现了整体相对性建模;此外,考虑到事件表达过程中单词重要性的差异,构造了对样本不同单词的关注模块,最终形成了基于注意力平衡列表的溯因推理模型。实验结果表明,与L2R2模型相比,所提模型在溯因推理主流数据集叙事文本中的溯因推理(ART)上的准确率和AUC分别提高了约0.46和1.36个百分点,证明了所提模型的有效性。
中图分类号:
徐铭, 李林昊, 齐巧玲, 王利琴. 基于注意力平衡列表的溯因推理模型[J]. 计算机应用, 2023, 43(2): 349-355.
Ming XU, Linhao LI, Qiaoling QI, Liqin WANG. Abductive reasoning model based on attention balance list[J]. Journal of Computer Applications, 2023, 43(2): 349-355.
任务 | 内容 | 答案 |
---|---|---|
NLI | P:一名男子正在检查一个人的衣服 H:这个男人在睡觉 | E,N或C |
P:一个老人和年轻人正在笑 H:两个男人看着在地板上玩耍的猫 | E,N或C | |
P:一场有多个男性参与的足球比赛 H:一些男人正在进行一项运动 | E,N或C | |
aNLI | O1:多蒂心情很糟糕 H1:多蒂吃坏了东西 H2:打电话给几个好朋友聊天 O2:之后她感觉好多了 | H1或 H2 |
表1 NLI与aNLI任务对比
Tab. 1 Comparison of NLI and aNLI tasks
任务 | 内容 | 答案 |
---|---|---|
NLI | P:一名男子正在检查一个人的衣服 H:这个男人在睡觉 | E,N或C |
P:一个老人和年轻人正在笑 H:两个男人看着在地板上玩耍的猫 | E,N或C | |
P:一场有多个男性参与的足球比赛 H:一些男人正在进行一项运动 | E,N或C | |
aNLI | O1:多蒂心情很糟糕 H1:多蒂吃坏了东西 H2:打电话给几个好朋友聊天 O2:之后她感觉好多了 | H1或 H2 |
表2 HInput中向量的描述
Tab. 2 Description of vectors in HInput
总体描述 | 内容 | 训练集 | 验证集 | 测试集 |
---|---|---|---|---|
唯一 事件总数 | 背景 | 17 801 | 1 532 | 3 059 |
合理的假设h+ | 72 046 | 1 532 | 3 059 | |
不合理假设h- | 166 820 | 1 532 | 3 059 | |
事件中平均 假设的数量 | 合理的假设h+ | 4.05 | 1.00 | 1.00 |
不合理假设h- | 9.37 | 1.00 | 1.00 | |
平均包含 多少单词 | 不合理假设h- | 8.34 | 8.26 | 8.54 |
不合理假设h- | 8.28 | 8.55 | 8.53 | |
第一个观测 O1 | 8.09 | 8.07 | 8.17 | |
第二个观测 O2 | 9.29 | 9.30 | 9.31 |
表3 ART数据集的统计信息
Tab. 3 Statistics of ART dataset
总体描述 | 内容 | 训练集 | 验证集 | 测试集 |
---|---|---|---|---|
唯一 事件总数 | 背景 | 17 801 | 1 532 | 3 059 |
合理的假设h+ | 72 046 | 1 532 | 3 059 | |
不合理假设h- | 166 820 | 1 532 | 3 059 | |
事件中平均 假设的数量 | 合理的假设h+ | 4.05 | 1.00 | 1.00 |
不合理假设h- | 9.37 | 1.00 | 1.00 | |
平均包含 多少单词 | 不合理假设h- | 8.34 | 8.26 | 8.54 |
不合理假设h- | 8.28 | 8.55 | 8.53 | |
第一个观测 O1 | 8.09 | 8.07 | 8.17 | |
第二个观测 O2 | 9.29 | 9.30 | 9.31 |
模型 | 验证集 | 测试集 | |
---|---|---|---|
Acc | AUC | Acc | |
Human Perf[ | — | — | 91.40 |
Majority[ | 50.80 | — | — |
GPT | 62.70 | — | 62.30 |
BERT-Large | 69.10 | 69.03 | 68.90 |
RoBERTa-Large | 85.76 | 85.02 | 84.48 |
L2R2 | 88.44 | 87.53 | 86.81 |
MHKA | 87.85 | — | — |
ABL | 88.90 | 88.89 | 86.84 |
表4 不同模型的实验结果 ( %)
Tab. 4 Experimental results of different models
模型 | 验证集 | 测试集 | |
---|---|---|---|
Acc | AUC | Acc | |
Human Perf[ | — | — | 91.40 |
Majority[ | 50.80 | — | — |
GPT | 62.70 | — | 62.30 |
BERT-Large | 69.10 | 69.03 | 68.90 |
RoBERTa-Large | 85.76 | 85.02 | 84.48 |
L2R2 | 88.44 | 87.53 | 86.81 |
MHKA | 87.85 | — | — |
ABL | 88.90 | 88.89 | 86.84 |
模型 | 训练集比例/% | ||||
---|---|---|---|---|---|
1 | 2 | 5 | 10 | 100 | |
平衡列表 | 78.33 | 79.96 | 82.51 | 84.33 | 88.51 |
平衡列表+注意力机制 | 79.24 | 80.81 | 83.03 | 84.92 | 88.90 |
表5 增加注意力机制在Acc上的结果 ( %)
Tab. 5 Results of adding attention mechanism on Acc
模型 | 训练集比例/% | ||||
---|---|---|---|---|---|
1 | 2 | 5 | 10 | 100 | |
平衡列表 | 78.33 | 79.96 | 82.51 | 84.33 | 88.51 |
平衡列表+注意力机制 | 79.24 | 80.81 | 83.03 | 84.92 | 88.90 |
模型 | 训练集比例/% | ||||
---|---|---|---|---|---|
1 | 2 | 5 | 10 | 100 | |
平衡列表 | 75.08 | 77.03 | 80.55 | 83.00 | 88.19 |
平衡列表+注意力机制 | 76.53 | 78.20 | 81.81 | 83.65 | 88.89 |
表6 增加注意力机制在AUC上的结果 ( %)
Tab. 6 Results of adding attention mechanism on AUC
模型 | 训练集比例/% | ||||
---|---|---|---|---|---|
1 | 2 | 5 | 10 | 100 | |
平衡列表 | 75.08 | 77.03 | 80.55 | 83.00 | 88.19 |
平衡列表+注意力机制 | 76.53 | 78.20 | 81.81 | 83.65 | 88.89 |
1 | MINSKY M. Deep issues: commonsense-based interfaces[J]. Communications of the ACM, 2000, 43(8): 66-73. 10.1145/345124.345145 |
2 | DAVIS E, MARCUS G. Commonsense reasoning and commonsense knowledge in artificial intelligence[J]. Communications of the ACM, 2015, 58(9): 92-103. 10.1145/2701413 |
3 | BHAGAVATULA C, LE BRAS R, MALAVIYA C, et al. Abductive commonsense reasoning[EB/OL]. (2020-02-14) [2021-12-10].. |
4 | BOWMAN S R, ANGELI G, POTTS C, et al. A large annotated corpus for learning natural language inference[C]// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: ACL, 2015: 632-642. 10.18653/v1/d15-1075 |
5 | WILLIAMS A, NANGIA N, BOWMAN S R. A broad-coverage challenge corpus for sentence understanding through inference[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Stroudsburg, PA: ACL, 2018: 1112-1122. 10.18653/v1/n18-1101 |
6 | MacCARTNEY B, MANNING C D. Natural logic for textual inference[C]// Proceedings of the 2007 ACL-PASCAL Workshop on Textual Entailment and Paraphrasing. Stroudsburg, PA: ACL, 2007: 193-200. 10.3115/1654536.1654575 |
7 | ZELLERS R, BISK Y, SCHWARTZ R, et al. SWAG: a large-scale adversarial dataset for grounded commonsense inference[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: ACL, 2018: 93-104. 10.18653/v1/d18-1009 |
8 | ZHU Y C, PANG L, LAN Y Y, et al. L2R2: leveraging ranking for abductive reasoning[C]// Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2020: 1961-1964. 10.1145/3397271.3401332 |
9 | YU C L, ZHANG H M, SONG Y Q, et al. Enriching large-scale eventuality knowledge graph with entailment relations[C/OL]// Proceedings of the 2020 Conference on Automated Knowledge Base Construction. [2021-12-10].. 10.1145/3366423.3380107 |
10 | BAUER L, BANSAL M. Identify, align, and integrate: matching knowledge graphs to commonsense reasoning tasks[C]// Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics. Stroudsburg, PA: ACL, 2021:2259-2272. 10.18653/v1/2021.eacl-main.192 |
11 | MA K X, ILIEVSKI F, FRANCIS J, et al. Knowledge-driven data construction for zero-shot evaluation in commonsense question answering[C]// Proceedings of the 35th AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2021: 13507-13515. 10.1609/aaai.v35i15.17593 |
12 | HUANG Y C, ZHANG Y Z, ELACHQAR O, et al. INSET: sentence infilling with INter-SEntential Transformer[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: ACL, 2020: 2502-2515. 10.18653/v1/2020.acl-main.226 |
13 | ZHOU W C S, LEE D H, SELVAM R K, et al. Pre-training text-to-text transformers for concept-centric common sense[EB/OL]. (2022-02-10) [2022-11-10].. |
14 | YU C W, CHEN J W, CHEN Y L. Enhanced LSTM framework for water-cooled chiller COP forecasting[C]// Proceedings of the 2021 IEEE International Conference on Consumer Electronics. Piscataway: IEEE, 2021: 1-3. 10.1109/icce50685.2021.9427706 |
15 | 岳增营,叶霞,刘睿珩. 基于语言模型的预训练技术研究综述[J]. 中文信息学报, 2021, 35(9):15-29. 10.3969/j.issn.1003-0077.2021.09.002 |
YUE Z Y, YE X, LIU R H. A survey of language model based pre-training technology[J]. Journal of Chinese Information Processing, 2021, 35(9):15-29. 10.3969/j.issn.1003-0077.2021.09.002 | |
16 | DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Stroudsburg, PA: ACL, 2019:4171-4186. 10.18653/v1/n18-2 |
17 | LIU Y H, OTT M, GOYAL N, et al. RoBERTa: a robustly optimized BERT pretraining approach[EB/OL]. (2019-07-26) [2021-12-10].. |
18 | CHEN Q, ZHU X D, LING Z H, et al. Enhanced LSTM for natural language inference[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, PA: ACL, 2017: 1657-1668. 10.18653/v1/p17-1152 |
19 | HERBRICH R, GRAEPEL T, OBERMAYER K. Large margin rank boundaries for ordinal regression[M]// SMOLA A J, BARTLETT P L, SCHÖLKOPF B, et al. Advances in Large Margin Classifiers. Cambridge: MIT Press, 2000:115-132. 10.7551/mitpress/1113.003.0010 |
20 | BURGES C, SHAKED T, RENSHAW E, et al. Learning to rank using gradient descent[C]// Proceedings of the 22nd International Conference on Machine Learning. New York: ACM, 2005: 89-96. 10.1145/1102351.1102363 |
21 | BURGES C J C, RAGNO R, LE Q V. Learning to rank with nonsmooth cost functions[C]// Proceedings of the 19th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2006: 193-200. 10.7551/mitpress/7503.003.0029 |
22 | CAO Z, QIN T, LIU T Y, et al. Learning to rank: from pairwise approach to listwise approach[C]// Proceedings of the 24th International Conference on Machine Learning. New York: ACM, 2007: 129-136. 10.1145/1273496.1273513 |
23 | LI M H, LIU X L, J van de WEIJER, et al. Learning to rank for active learning: a listwise approach[C]// Proceedings of the 25th International Conference on Pattern Recognition. Piscataway: IEEE, 2021: 5587-5594. 10.1109/icpr48806.2021.9412680 |
24 | QIN T, LIU T Y, LI H. A general approximation framework for direct optimization of information retrieval measures[J]. Information Retrieval, 2010, 13(4):375-397. 10.1007/s10791-009-9124-x |
25 | PAUL D, FRANK A. Social commonsense reasoning with multi-head knowledge attention[C]// Findings of the Association for Computational Linguistics: EMNLP 2020. Stroudsburg, PA: ACL, 2020: 2969-2980. 10.18653/v1/2020.findings-emnlp.267 |
26 | MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space[EB/OL]. (2013-09-07) [2021-12-10].. 10.3126/jiee.v3i1.34327 |
27 | PENNINGTON J, SOCHER R, MANNING C D. GloVe: global vectors for word representation[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Scottsdale: Workshop Track Proceedings. Stroudsburg, PA: ACL, 2014: 1532-1543. 10.3115/v1/d14-1162 |
28 | PETERS M E, NEUMANN M, IYYER M, et al. Deep contextualized word representations[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1. Stroudsburg, PA: ACL, 2018: 2227-2237. 10.18653/v1/n18-1202 |
29 | HE P C, LIU X D, GAO J F, et al. DeBERTa: decoding-enhanced BERT with disentangled attention[EB/OL]. (2021-10-06) [2021-12-10].. |
30 | LI W, GAO C, NIU G C, et al. UNIMO: towards unified-modal understanding and generation via cross-modal contrastive learning[C]// Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. Stroudsburg, PA: ACL, 2021: 2592-2607. 10.18653/v1/2021.acl-long.202 |
[1] | 帅奇, 王海瑞, 朱贵富. 基于双向对比训练的中文故事结尾生成模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2683-2688. |
[2] | 赵志强, 马培红, 黑新宏. 基于双重注意力机制的人群计数方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2886-2892. |
[3] | 秦璟, 秦志光, 李发礼, 彭悦恒. 基于概率稀疏自注意力神经网络的重性抑郁疾患诊断[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2970-2974. |
[4] | 李力铤, 华蓓, 贺若舟, 徐况. 基于解耦注意力机制的多变量时序预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2732-2738. |
[5] | 薛凯鹏, 徐涛, 廖春节. 融合自监督和多层交叉注意力的多模态情感分析网络[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2387-2392. |
[6] | 汪雨晴, 朱广丽, 段文杰, 李书羽, 周若彤. 基于交互注意力机制的心理咨询文本情感分类模型[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2393-2399. |
[7] | 高鹏淇, 黄鹤鸣, 樊永红. 融合坐标与多头注意力机制的交互语音情感识别[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2400-2406. |
[8] | 李晨阳, 张龙, 郑秋生, 钱少华. 基于扩散序列的多元可控文本生成[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2414-2420. |
[9] | 张全梅, 黄润萍, 滕飞, 张海波, 周南. 融合异构信息的自动国际疾病分类编码方法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2476-2482. |
[10] | 李钟华, 白云起, 王雪津, 黄雷雷, 林初俊, 廖诗宇. 基于图像增强的低照度人脸检测[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2588-2594. |
[11] | 莫尚斌, 王文君, 董凌, 高盛祥, 余正涛. 基于多路信息聚合协同解码的单通道语音增强[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2611-2617. |
[12] | 熊武, 曹从军, 宋雪芳, 邵云龙, 王旭升. 基于多尺度混合域注意力机制的笔迹鉴别方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2225-2232. |
[13] | 李欢欢, 黄添强, 丁雪梅, 罗海峰, 黄丽清. 基于多尺度时空图卷积网络的交通出行需求预测[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2065-2072. |
[14] | 毛典辉, 李学博, 刘峻岭, 张登辉, 颜文婧. 基于并行异构图和序列注意力机制的中文实体关系抽取模型[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2018-2025. |
[15] | 刘丽, 侯海金, 王安红, 张涛. 基于多尺度注意力的生成式信息隐藏算法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2102-2109. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||