Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (2): 349-355.DOI: 10.11772/j.issn.1001-9081.2021122105
Special Issue: 人工智能
• Artificial intelligence • Previous Articles Next Articles
Ming XU1,2,3, Linhao LI1,2,3(), Qiaoling QI1,2,3, Liqin WANG1,2,3
Received:
2021-12-14
Revised:
2022-05-03
Accepted:
2022-05-13
Online:
2023-02-08
Published:
2023-02-10
Contact:
Linhao LI
About author:
XU Ming, born in 1996, M. S. candidate. His research interests include natural language processing, text classification.Supported by:
徐铭1,2,3, 李林昊1,2,3(), 齐巧玲1,2,3, 王利琴1,2,3
通讯作者:
李林昊
作者简介:
徐铭(1996—),男,山东滕州人,硕士研究生,主要研究方向:自然语言处理、文本分类基金资助:
CLC Number:
Ming XU, Linhao LI, Qiaoling QI, Liqin WANG. Abductive reasoning model based on attention balance list[J]. Journal of Computer Applications, 2023, 43(2): 349-355.
徐铭, 李林昊, 齐巧玲, 王利琴. 基于注意力平衡列表的溯因推理模型[J]. 《计算机应用》唯一官方网站, 2023, 43(2): 349-355.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2021122105
任务 | 内容 | 答案 |
---|---|---|
NLI | P:一名男子正在检查一个人的衣服 H:这个男人在睡觉 | E,N或C |
P:一个老人和年轻人正在笑 H:两个男人看着在地板上玩耍的猫 | E,N或C | |
P:一场有多个男性参与的足球比赛 H:一些男人正在进行一项运动 | E,N或C | |
aNLI | O1:多蒂心情很糟糕 H1:多蒂吃坏了东西 H2:打电话给几个好朋友聊天 O2:之后她感觉好多了 | H1或 H2 |
Tab. 1 Comparison of NLI and aNLI tasks
任务 | 内容 | 答案 |
---|---|---|
NLI | P:一名男子正在检查一个人的衣服 H:这个男人在睡觉 | E,N或C |
P:一个老人和年轻人正在笑 H:两个男人看着在地板上玩耍的猫 | E,N或C | |
P:一场有多个男性参与的足球比赛 H:一些男人正在进行一项运动 | E,N或C | |
aNLI | O1:多蒂心情很糟糕 H1:多蒂吃坏了东西 H2:打电话给几个好朋友聊天 O2:之后她感觉好多了 | H1或 H2 |
Tab. 2 Description of vectors in HInput
总体描述 | 内容 | 训练集 | 验证集 | 测试集 |
---|---|---|---|---|
唯一 事件总数 | 背景 | 17 801 | 1 532 | 3 059 |
合理的假设h+ | 72 046 | 1 532 | 3 059 | |
不合理假设h- | 166 820 | 1 532 | 3 059 | |
事件中平均 假设的数量 | 合理的假设h+ | 4.05 | 1.00 | 1.00 |
不合理假设h- | 9.37 | 1.00 | 1.00 | |
平均包含 多少单词 | 不合理假设h- | 8.34 | 8.26 | 8.54 |
不合理假设h- | 8.28 | 8.55 | 8.53 | |
第一个观测 O1 | 8.09 | 8.07 | 8.17 | |
第二个观测 O2 | 9.29 | 9.30 | 9.31 |
Tab. 3 Statistics of ART dataset
总体描述 | 内容 | 训练集 | 验证集 | 测试集 |
---|---|---|---|---|
唯一 事件总数 | 背景 | 17 801 | 1 532 | 3 059 |
合理的假设h+ | 72 046 | 1 532 | 3 059 | |
不合理假设h- | 166 820 | 1 532 | 3 059 | |
事件中平均 假设的数量 | 合理的假设h+ | 4.05 | 1.00 | 1.00 |
不合理假设h- | 9.37 | 1.00 | 1.00 | |
平均包含 多少单词 | 不合理假设h- | 8.34 | 8.26 | 8.54 |
不合理假设h- | 8.28 | 8.55 | 8.53 | |
第一个观测 O1 | 8.09 | 8.07 | 8.17 | |
第二个观测 O2 | 9.29 | 9.30 | 9.31 |
模型 | 验证集 | 测试集 | |
---|---|---|---|
Acc | AUC | Acc | |
Human Perf[ | — | — | 91.40 |
Majority[ | 50.80 | — | — |
GPT | 62.70 | — | 62.30 |
BERT-Large | 69.10 | 69.03 | 68.90 |
RoBERTa-Large | 85.76 | 85.02 | 84.48 |
L2R2 | 88.44 | 87.53 | 86.81 |
MHKA | 87.85 | — | — |
ABL | 88.90 | 88.89 | 86.84 |
Tab. 4 Experimental results of different models
模型 | 验证集 | 测试集 | |
---|---|---|---|
Acc | AUC | Acc | |
Human Perf[ | — | — | 91.40 |
Majority[ | 50.80 | — | — |
GPT | 62.70 | — | 62.30 |
BERT-Large | 69.10 | 69.03 | 68.90 |
RoBERTa-Large | 85.76 | 85.02 | 84.48 |
L2R2 | 88.44 | 87.53 | 86.81 |
MHKA | 87.85 | — | — |
ABL | 88.90 | 88.89 | 86.84 |
模型 | 训练集比例/% | ||||
---|---|---|---|---|---|
1 | 2 | 5 | 10 | 100 | |
平衡列表 | 78.33 | 79.96 | 82.51 | 84.33 | 88.51 |
平衡列表+注意力机制 | 79.24 | 80.81 | 83.03 | 84.92 | 88.90 |
Tab. 5 Results of adding attention mechanism on Acc
模型 | 训练集比例/% | ||||
---|---|---|---|---|---|
1 | 2 | 5 | 10 | 100 | |
平衡列表 | 78.33 | 79.96 | 82.51 | 84.33 | 88.51 |
平衡列表+注意力机制 | 79.24 | 80.81 | 83.03 | 84.92 | 88.90 |
模型 | 训练集比例/% | ||||
---|---|---|---|---|---|
1 | 2 | 5 | 10 | 100 | |
平衡列表 | 75.08 | 77.03 | 80.55 | 83.00 | 88.19 |
平衡列表+注意力机制 | 76.53 | 78.20 | 81.81 | 83.65 | 88.89 |
Tab. 6 Results of adding attention mechanism on AUC
模型 | 训练集比例/% | ||||
---|---|---|---|---|---|
1 | 2 | 5 | 10 | 100 | |
平衡列表 | 75.08 | 77.03 | 80.55 | 83.00 | 88.19 |
平衡列表+注意力机制 | 76.53 | 78.20 | 81.81 | 83.65 | 88.89 |
1 | MINSKY M. Deep issues: commonsense-based interfaces[J]. Communications of the ACM, 2000, 43(8): 66-73. 10.1145/345124.345145 |
2 | DAVIS E, MARCUS G. Commonsense reasoning and commonsense knowledge in artificial intelligence[J]. Communications of the ACM, 2015, 58(9): 92-103. 10.1145/2701413 |
3 | BHAGAVATULA C, LE BRAS R, MALAVIYA C, et al. Abductive commonsense reasoning[EB/OL]. (2020-02-14) [2021-12-10].. |
4 | BOWMAN S R, ANGELI G, POTTS C, et al. A large annotated corpus for learning natural language inference[C]// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: ACL, 2015: 632-642. 10.18653/v1/d15-1075 |
5 | WILLIAMS A, NANGIA N, BOWMAN S R. A broad-coverage challenge corpus for sentence understanding through inference[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Stroudsburg, PA: ACL, 2018: 1112-1122. 10.18653/v1/n18-1101 |
6 | MacCARTNEY B, MANNING C D. Natural logic for textual inference[C]// Proceedings of the 2007 ACL-PASCAL Workshop on Textual Entailment and Paraphrasing. Stroudsburg, PA: ACL, 2007: 193-200. 10.3115/1654536.1654575 |
7 | ZELLERS R, BISK Y, SCHWARTZ R, et al. SWAG: a large-scale adversarial dataset for grounded commonsense inference[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: ACL, 2018: 93-104. 10.18653/v1/d18-1009 |
8 | ZHU Y C, PANG L, LAN Y Y, et al. L2R2: leveraging ranking for abductive reasoning[C]// Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2020: 1961-1964. 10.1145/3397271.3401332 |
9 | YU C L, ZHANG H M, SONG Y Q, et al. Enriching large-scale eventuality knowledge graph with entailment relations[C/OL]// Proceedings of the 2020 Conference on Automated Knowledge Base Construction. [2021-12-10].. 10.1145/3366423.3380107 |
10 | BAUER L, BANSAL M. Identify, align, and integrate: matching knowledge graphs to commonsense reasoning tasks[C]// Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics. Stroudsburg, PA: ACL, 2021:2259-2272. 10.18653/v1/2021.eacl-main.192 |
11 | MA K X, ILIEVSKI F, FRANCIS J, et al. Knowledge-driven data construction for zero-shot evaluation in commonsense question answering[C]// Proceedings of the 35th AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2021: 13507-13515. 10.1609/aaai.v35i15.17593 |
12 | HUANG Y C, ZHANG Y Z, ELACHQAR O, et al. INSET: sentence infilling with INter-SEntential Transformer[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: ACL, 2020: 2502-2515. 10.18653/v1/2020.acl-main.226 |
13 | ZHOU W C S, LEE D H, SELVAM R K, et al. Pre-training text-to-text transformers for concept-centric common sense[EB/OL]. (2022-02-10) [2022-11-10].. |
14 | YU C W, CHEN J W, CHEN Y L. Enhanced LSTM framework for water-cooled chiller COP forecasting[C]// Proceedings of the 2021 IEEE International Conference on Consumer Electronics. Piscataway: IEEE, 2021: 1-3. 10.1109/icce50685.2021.9427706 |
15 | 岳增营,叶霞,刘睿珩. 基于语言模型的预训练技术研究综述[J]. 中文信息学报, 2021, 35(9):15-29. 10.3969/j.issn.1003-0077.2021.09.002 |
YUE Z Y, YE X, LIU R H. A survey of language model based pre-training technology[J]. Journal of Chinese Information Processing, 2021, 35(9):15-29. 10.3969/j.issn.1003-0077.2021.09.002 | |
16 | DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Stroudsburg, PA: ACL, 2019:4171-4186. 10.18653/v1/n18-2 |
17 | LIU Y H, OTT M, GOYAL N, et al. RoBERTa: a robustly optimized BERT pretraining approach[EB/OL]. (2019-07-26) [2021-12-10].. |
18 | CHEN Q, ZHU X D, LING Z H, et al. Enhanced LSTM for natural language inference[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, PA: ACL, 2017: 1657-1668. 10.18653/v1/p17-1152 |
19 | HERBRICH R, GRAEPEL T, OBERMAYER K. Large margin rank boundaries for ordinal regression[M]// SMOLA A J, BARTLETT P L, SCHÖLKOPF B, et al. Advances in Large Margin Classifiers. Cambridge: MIT Press, 2000:115-132. 10.7551/mitpress/1113.003.0010 |
20 | BURGES C, SHAKED T, RENSHAW E, et al. Learning to rank using gradient descent[C]// Proceedings of the 22nd International Conference on Machine Learning. New York: ACM, 2005: 89-96. 10.1145/1102351.1102363 |
21 | BURGES C J C, RAGNO R, LE Q V. Learning to rank with nonsmooth cost functions[C]// Proceedings of the 19th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2006: 193-200. 10.7551/mitpress/7503.003.0029 |
22 | CAO Z, QIN T, LIU T Y, et al. Learning to rank: from pairwise approach to listwise approach[C]// Proceedings of the 24th International Conference on Machine Learning. New York: ACM, 2007: 129-136. 10.1145/1273496.1273513 |
23 | LI M H, LIU X L, J van de WEIJER, et al. Learning to rank for active learning: a listwise approach[C]// Proceedings of the 25th International Conference on Pattern Recognition. Piscataway: IEEE, 2021: 5587-5594. 10.1109/icpr48806.2021.9412680 |
24 | QIN T, LIU T Y, LI H. A general approximation framework for direct optimization of information retrieval measures[J]. Information Retrieval, 2010, 13(4):375-397. 10.1007/s10791-009-9124-x |
25 | PAUL D, FRANK A. Social commonsense reasoning with multi-head knowledge attention[C]// Findings of the Association for Computational Linguistics: EMNLP 2020. Stroudsburg, PA: ACL, 2020: 2969-2980. 10.18653/v1/2020.findings-emnlp.267 |
26 | MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space[EB/OL]. (2013-09-07) [2021-12-10].. 10.3126/jiee.v3i1.34327 |
27 | PENNINGTON J, SOCHER R, MANNING C D. GloVe: global vectors for word representation[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Scottsdale: Workshop Track Proceedings. Stroudsburg, PA: ACL, 2014: 1532-1543. 10.3115/v1/d14-1162 |
28 | PETERS M E, NEUMANN M, IYYER M, et al. Deep contextualized word representations[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1. Stroudsburg, PA: ACL, 2018: 2227-2237. 10.18653/v1/n18-1202 |
29 | HE P C, LIU X D, GAO J F, et al. DeBERTa: decoding-enhanced BERT with disentangled attention[EB/OL]. (2021-10-06) [2021-12-10].. |
30 | LI W, GAO C, NIU G C, et al. UNIMO: towards unified-modal understanding and generation via cross-modal contrastive learning[C]// Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. Stroudsburg, PA: ACL, 2021: 2592-2607. 10.18653/v1/2021.acl-long.202 |
[1] | Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974. |
[2] | Liting LI, Bei HUA, Ruozhou HE, Kuang XU. Multivariate time series prediction model based on decoupled attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2732-2738. |
[3] | Qi SHUAI, Hairui WANG, Guifu ZHU. Chinese story ending generation model based on bidirectional contrastive training [J]. Journal of Computer Applications, 2024, 44(9): 2683-2688. |
[4] | Zhiqiang ZHAO, Peihong MA, Xinhong HEI. Crowd counting method based on dual attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2886-2892. |
[5] | Kaipeng XUE, Tao XU, Chunjie LIAO. Multimodal sentiment analysis network with self-supervision and multi-layer cross attention [J]. Journal of Computer Applications, 2024, 44(8): 2387-2392. |
[6] | Pengqi GAO, Heming HUANG, Yonghong FAN. Fusion of coordinate and multi-head attention mechanisms for interactive speech emotion recognition [J]. Journal of Computer Applications, 2024, 44(8): 2400-2406. |
[7] | Chenyang LI, Long ZHANG, Qiusheng ZHENG, Shaohua QIAN. Multivariate controllable text generation based on diffusion sequences [J]. Journal of Computer Applications, 2024, 44(8): 2414-2420. |
[8] | Quanmei ZHANG, Runping HUANG, Fei TENG, Haibo ZHANG, Nan ZHOU. Automatic international classification of disease coding method incorporating heterogeneous information [J]. Journal of Computer Applications, 2024, 44(8): 2476-2482. |
[9] | Zhonghua LI, Yunqi BAI, Xuejin WANG, Leilei HUANG, Chujun LIN, Shiyu LIAO. Low illumination face detection based on image enhancement [J]. Journal of Computer Applications, 2024, 44(8): 2588-2594. |
[10] | Shangbin MO, Wenjun WANG, Ling DONG, Shengxiang GAO, Zhengtao YU. Single-channel speech enhancement based on multi-channel information aggregation and collaborative decoding [J]. Journal of Computer Applications, 2024, 44(8): 2611-2617. |
[11] | Li LIU, Haijin HOU, Anhong WANG, Tao ZHANG. Generative data hiding algorithm based on multi-scale attention [J]. Journal of Computer Applications, 2024, 44(7): 2102-2109. |
[12] | Song XU, Wenbo ZHANG, Yifan WANG. Lightweight video salient object detection network based on spatiotemporal information [J]. Journal of Computer Applications, 2024, 44(7): 2192-2199. |
[13] | Dahai LI, Zhonghua WANG, Zhendong WANG. Dual-branch low-light image enhancement network combining spatial and frequency domain information [J]. Journal of Computer Applications, 2024, 44(7): 2175-2182. |
[14] | Wenliang WEI, Yangping WANG, Biao YUE, Anzheng WANG, Zhe ZHANG. Deep learning model for infrared and visible image fusion based on illumination weight allocation and attention [J]. Journal of Computer Applications, 2024, 44(7): 2183-2191. |
[15] | Wu XIONG, Congjun CAO, Xuefang SONG, Yunlong SHAO, Xusheng WANG. Handwriting identification method based on multi-scale mixed domain attention mechanism [J]. Journal of Computer Applications, 2024, 44(7): 2225-2232. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||