Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (1): 64-70.DOI: 10.11772/j.issn.1001-9081.2021020335
Special Issue: 人工智能
• Artificial intelligence • Previous Articles Next Articles
Yu PENG, Xiaoyu LI(), Shijie HU, Xiaolei LIU, Weizhong QIAN
Received:
2021-03-08
Revised:
2021-05-12
Accepted:
2021-05-17
Online:
2021-05-24
Published:
2022-01-10
Contact:
Xiaoyu LI
About author:
HU Shijie, born in 1998, M. S. candidate. His research interests include deep learning, natural language processing.Supported by:
通讯作者:
李晓瑜
作者简介:
彭宇(1996—),男,四川眉山人,硕士研究生,主要研究方向:深度学习、自然语言处理基金资助:
CLC Number:
Yu PENG, Xiaoyu LI, Shijie HU, Xiaolei LIU, Weizhong QIAN. Three-stage question answering model based on BERT[J]. Journal of Computer Applications, 2022, 42(1): 64-70.
彭宇, 李晓瑜, 胡世杰, 刘晓磊, 钱伟中. 基于BERT的三阶段式问答模型[J]. 《计算机应用》唯一官方网站, 2022, 42(1): 64-70.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2021020335
数据集 | 语言类型 | 训练集样本数 | 测试集样本数 |
---|---|---|---|
SQuAD2.0 | 英文 | 130 319 | 11 873 |
CMRC2018 | 中文 | 10 321 | 3 351 |
Tab. 1 Statistics of experimental datasets
数据集 | 语言类型 | 训练集样本数 | 测试集样本数 |
---|---|---|---|
SQuAD2.0 | 英文 | 130 319 | 11 873 |
CMRC2018 | 中文 | 10 321 | 3 351 |
参数 | 值 |
---|---|
epochs | 4 |
batch_size | 24 |
max_seq_length | 368 |
dropout | 0.1 |
learning rate | 0.000 05 |
warm-up rate | 0.1 |
Tab. 2 Parameter setting
参数 | 值 |
---|---|
epochs | 4 |
batch_size | 24 |
max_seq_length | 368 |
dropout | 0.1 |
learning rate | 0.000 05 |
warm-up rate | 0.1 |
模型 | V1.1 | V2.0 | ||
---|---|---|---|---|
EM | F1 | EM | F1 | |
人类表现 | 80.3 | 90.5 | 86.3 | 89.0 |
BiDAF | 67.7 | 77.3 | 57.7 | 62.3 |
Match-LSTM | 67.6 | 76.8 | 60.3 | 63.5 |
SAN | 75.6 | 84.8 | 67.9 | 70.7 |
QANet | 73.6 | 82.7 | 62.5 | 66.4 |
BERTbase | 80.8 | 88.5 | 74.4 | 77.1 |
+BiDAF | 81.9 | 89.0 | 74.0 | 76.9 |
+SAN | 82.2 | 89.6 | 74.9 | 77.6 |
+本文模型 | 82.8 | 88.9 | 76.8 | 78.7 |
Tab. 3 Result comparison of different models on SQuAD dataset
模型 | V1.1 | V2.0 | ||
---|---|---|---|---|
EM | F1 | EM | F1 | |
人类表现 | 80.3 | 90.5 | 86.3 | 89.0 |
BiDAF | 67.7 | 77.3 | 57.7 | 62.3 |
Match-LSTM | 67.6 | 76.8 | 60.3 | 63.5 |
SAN | 75.6 | 84.8 | 67.9 | 70.7 |
QANet | 73.6 | 82.7 | 62.5 | 66.4 |
BERTbase | 80.8 | 88.5 | 74.4 | 77.1 |
+BiDAF | 81.9 | 89.0 | 74.0 | 76.9 |
+SAN | 82.2 | 89.6 | 74.9 | 77.6 |
+本文模型 | 82.8 | 88.9 | 76.8 | 78.7 |
模型 | EM | F1 |
---|---|---|
人类表现 | 91.08 | 97.35 |
T-Reader | 39.43 | 62.41 |
SXU-Reader | 40.29 | 66.45 |
R-NET | 45.42 | 69.83 |
GM-Reader | 56.32 | 77.41 |
MCA-Reader | 63.90 | 82.62 |
BERTbase | 63.60 | 83.90 |
+本文模型 | 65.00 | 85.10 |
Tab. 4 Result comparison of different models on CMRC2018 dataset
模型 | EM | F1 |
---|---|---|
人类表现 | 91.08 | 97.35 |
T-Reader | 39.43 | 62.41 |
SXU-Reader | 40.29 | 66.45 |
R-NET | 45.42 | 69.83 |
GM-Reader | 56.32 | 77.41 |
MCA-Reader | 63.90 | 82.62 |
BERTbase | 63.60 | 83.90 |
+本文模型 | 65.00 | 85.10 |
1 | HERMANN K M, KOČISKÝ T, GREFENSTETTE E, et al. Teaching machines to read and comprehend[C]// Proceedings of the 28th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2015: 1693-1701. 10.18653/v1/d16-1116 |
2 | CUI Y M, LIU T, CHE W X, et al. A span-extraction dataset for Chinese machine reading comprehension[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2019: 5883-5889. 10.18653/v1/d19-1600 |
3 | 王小捷,白子薇,李可,等. 机器阅读理解的研究进展[J]. 北京邮电大学学报, 2019, 42(6): 1-9. 10.13190/j.jbupt.2019-111 |
WANG X J, BAI Z W, LI K, et al. Survey on machine reading comprehension[J]. Journal of Beijing University of Posts and Telecommunications, 2019, 42(6): 1-9. 10.13190/j.jbupt.2019-111 | |
4 | RAJPURKAR P, ZHANG J, LOPYREV K, et al. SQuAD: 100,000+ questions for machine comprehension of text[C]// Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2016: 2383-2392. 10.18653/v1/d16-1264 |
5 | KADLEC R, SCHMID M, BAJGAR O, et al. Text understanding with the attention sum reader network[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, PA: Association for Computational Linguistics, 2016: 908-918. 10.18653/v1/p16-1086 |
6 | SEO M, KEMBHAVI A, FARHADI A, et al. Bi-directional attention flow for machine comprehension[EB/OL]. (2018-06-21) [2020-12-22].. 10.1109/cvpr.2017.571 |
7 | DHINGRA B, LIU H X, YANG Z L, et al. Gated-attention readers for text comprehension[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, PA: Association for Computational Linguistics, 2017: 1832-1846. 10.18653/v1/p17-1168 |
8 | CUI Y M, CHEN Z P, WEI S, et al. Attention-over-attention neural networks for reading comprehension[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, PA: Association for Computational Linguistics, 2017: 593-602. 10.18653/v1/p17-1055 |
9 | RAJPURKAR P, JIA R, LIANG P. Know what you don’t know: unanswerable questions for SQuAD[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Stroudsburg, PA: Association for Computational Linguistics, 2018: 784-789. 10.18653/v1/p18-2124 |
10 | TRISCHLER A, WANG T, YUAN X D, et al. NewsQA: a machine comprehension dataset[C]// Proceedings of the 2nd Workshop on Representation Learning for NLP. Stroudsburg, PA: Association for Computational Linguistics, 2017: 191-200. 10.18653/v1/w17-2623 |
11 | YU A W, DOHAN D, LUONG M T, et al. QANet: combining local convolution with global self-attention for reading comprehension[EB/OL]. (2018-04-23) [2020-12-22].. |
12 | WANG S H, JIANG J. Machine comprehension using match-LSTM and answer pointer[EB/OL]. (2016-11-07) [2020-12-22].. 10.18653/v1/2020.findings-emnlp.370 |
13 | PENNINGTON J, SOCHER R, MANNING C D. GloVe: global vectors for word representation[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2014: 1532-1543. 10.3115/v1/d14-1162 |
14 | LE Q, MIKOLOV T. Distributed representations of sentences and documents[C]// Proceedings of the 31st International Conference on Machine Learning. New York: JMLR.org, 2014: 1188-1196. |
15 | PETERS M E, NEUMANN M, IYYER M, et al. Deep contextualized word representations[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Stroudsburg, PA: Association for Computational Linguistics, 2018: 2227-2237. 10.18653/v1/n18-1202 |
16 | DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Stroudsburg, PA: Association for Computational Linguistics, 2019: 4171-4186. 10.18653/v1/n19-1423 |
17 | LAN Z Z, CHEN M D, GOODMAN S, et al. ALBERT: a lite BERT for self-supervised learning of language representations[EB/OL]. (2020-02-09) [2020-12-22].. |
18 | HU M H, WEI F R, PENG Y X, et al. Read+ verify: machine reading comprehension with unanswerable questions[C]// Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2019: 6529-6537. 10.1609/aaai.v33i01.33016529 |
19 | ZHANG Z S, YANG J J, ZHAO H. Retrospective reader for machine reading comprehension[C]// Proceedings of the 35th AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2021:14506-14514. 10.1609/aaai.v34i05.6511 |
20 | CAI J, ZHU Z Z, NIE P, et al. A pairwise probe for understanding BERT fine-tuning on machine reading comprehension[C]// Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2020: 1665-1668. 10.1145/3397271.3401195 |
21 | SUN C, QIU X P, XU Y G, et al. How to fine-tune BERT for text classification?[C]// Proceedings of the 18th China National Conference on Chinese Computational Linguistics, LNCS11856. Cham: Springer, 2019: 194-206. |
22 | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2017: 6000-6010. 10.1016/s0262-4079(17)32358-8 |
[1] | Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974. |
[2] | Xiyuan WANG, Zhancheng ZHANG, Shaokang XU, Baocheng ZHANG, Xiaoqing LUO, Fuyuan HU. Unsupervised cross-domain transfer network for 3D/2D registration in surgical navigation [J]. Journal of Computer Applications, 2024, 44(9): 2911-2918. |
[3] | Shunyong LI, Shiyi LI, Rui XU, Xingwang ZHAO. Incomplete multi-view clustering algorithm based on self-attention fusion [J]. Journal of Computer Applications, 2024, 44(9): 2696-2703. |
[4] | Yunchuan HUANG, Yongquan JIANG, Juntao HUANG, Yan YANG. Molecular toxicity prediction based on meta graph isomorphism network [J]. Journal of Computer Applications, 2024, 44(9): 2964-2969. |
[5] | Yexin PAN, Zhe YANG. Optimization model for small object detection based on multi-level feature bidirectional fusion [J]. Journal of Computer Applications, 2024, 44(9): 2871-2877. |
[6] | Qi SHUAI, Hairui WANG, Guifu ZHU. Chinese story ending generation model based on bidirectional contrastive training [J]. Journal of Computer Applications, 2024, 44(9): 2683-2688. |
[7] | Quanmei ZHANG, Runping HUANG, Fei TENG, Haibo ZHANG, Nan ZHOU. Automatic international classification of disease coding method incorporating heterogeneous information [J]. Journal of Computer Applications, 2024, 44(8): 2476-2482. |
[8] | Yuhan LIU, Genlin JI, Hongping ZHANG. Video pedestrian anomaly detection method based on skeleton graph and mixed attention [J]. Journal of Computer Applications, 2024, 44(8): 2551-2557. |
[9] | Yanjie GU, Yingjun ZHANG, Xiaoqian LIU, Wei ZHOU, Wei SUN. Traffic flow forecasting via spatial-temporal multi-graph fusion [J]. Journal of Computer Applications, 2024, 44(8): 2618-2625. |
[10] | Qianhong SHI, Yan YANG, Yongquan JIANG, Xiaocao OUYANG, Wubo FAN, Qiang CHEN, Tao JIANG, Yuan LI. Multi-granularity abrupt change fitting network for air quality prediction [J]. Journal of Computer Applications, 2024, 44(8): 2643-2650. |
[11] | Qing LIU, Yanping CHEN, Anqi ZOU, Ruizhang HUANG, Yongbin QIN. Boundary-aware approach to machine reading comprehension [J]. Journal of Computer Applications, 2024, 44(7): 2004-2010. |
[12] | Zheng WU, Zhiyou CHENG, Zhentian WANG, Chuanjian WANG, Sheng WANG, Hui XU. Deep learning-based classification of head movement amplitude during patient anaesthesia resuscitation [J]. Journal of Computer Applications, 2024, 44(7): 2258-2263. |
[13] | Huanhuan LI, Tianqiang HUANG, Xuemei DING, Haifeng LUO, Liqing HUANG. Public traffic demand prediction based on multi-scale spatial-temporal graph convolutional network [J]. Journal of Computer Applications, 2024, 44(7): 2065-2072. |
[14] | Zhi ZHANG, Xin LI, Naifu YE, Kaixi HU. DKP: defending against model stealing attacks based on dark knowledge protection [J]. Journal of Computer Applications, 2024, 44(7): 2080-2086. |
[15] | Yiqun ZHAO, Zhiyu ZHANG, Xue DONG. Anisotropic travel time computation method based on dense residual connection physical information neural networks [J]. Journal of Computer Applications, 2024, 44(7): 2310-2318. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||