Three-stage question answering model based on BERT

doi:10.11772/j.issn.1001-9081.2021020335

Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (1): 64-70.DOI: 10.11772/j.issn.1001-9081.2021020335

Special Issue: 人工智能

• Artificial intelligence • Previous Articles Next Articles

Three-stage question answering model based on BERT

Yu PENG, Xiaoyu LI(), Shijie HU, Xiaolei LIU, Weizhong QIAN

School of Information and Software Engineering，University of Electronic Science and Technology of China，Chengdu Sichuan 610054，China

Received:2021-03-08 Revised:2021-05-12 Accepted:2021-05-17 Online:2021-05-24 Published:2022-01-10
Contact: Xiaoyu LI
About author:HU Shijie， born in 1998， M. S. candidate. His research interests include deep learning， natural language processing.
LIU Xiaolei， born in 1996， M. S. candidate. His research interests include deep learning， generative adversarial network， computer vision.
QIAN Weizhong， born in 1976， Ph. D， associate professor. His research interests include quantum machine learning， blockchain.
Supported by:
Science and Technology Project of Sichuan Province （Key Research and Development Program）(19ZDYF0794)

基于BERT的三阶段式问答模型

彭宇, 李晓瑜(), 胡世杰, 刘晓磊, 钱伟中

电子科技大学信息与软件工程学院，成都 610054

通讯作者: 李晓瑜
作者简介:彭宇（1996—），男，四川眉山人，硕士研究生，主要研究方向：深度学习、自然语言处理
李晓瑜（1985—），女，山东菏泽人，副教授，博士，CCF会员，主要研究方向：机器学习、数据分析、量子机器学习
胡世杰（1998—），男，江西抚州人，硕士研究生，主要研究方向：深度学习、自然语言处理
刘晓磊（1996—），男，山东烟台人，硕士研究生，主要研究方向：深度学习、生成对抗网络、计算机视觉
钱伟中（1976—），男，江苏无锡人，副教授，博士，主要研究方向：量子机器学习、区块链。
基金资助:
四川省科技计划项目（重点研发项目）(19ZDYF0794)

Abstract

Abstract:

The development of pre-trained language models has greatly promoted the progress of machine reading comprehension tasks. In order to make full use of shallow features of the pre-trained language model and further improve the accuracy of predictive answer of question answering model， a three-stage question answering model based on Bidirectional Encoder Representation from Transformers （BERT） was proposed. Firstly， the three stages of pre-answering， re-answering and answer-adjusting were designed based on BERT. Secondly， the inputs of embedding layer of BERT were treated as shallow features to pre-generate an answer in pre-answering stage. Then， the deep features fully encoded by BERT were used to re-generate another answer in re-answering stage. Finally， the final prediction result was generated by combining the previous two answers in answer-adjusting stage. Experimental results on English dataset Stanford Question Answering Dataset 2.0 （SQuAD2.0） and Chinese dataset Chinese Machine Reading Comprehension 2018 （CMRC2018） of span-extraction question answering task show that the Exact Match （EM） and F1 score （F1） of the proposed model are improved by the average of 1 to 3 percentage points compared with those of the similar baseline models， and the model has the extracted answer fragments more accurate. By combining shallow features of BERT with deep features， this three-stage model extends the abstract representation ability of BERT， and explores the application of shallow features of BERT in question answering models， and has the characteristics of simple structure， accurate prediction， and fast speed of training and inference.

Key words: Natural Language Processing (NLP), machine reading comprehension, span-extraction question answering, Bidirectional Encoder Representation from Transformers (BERT), deep learning

摘要：

预训练语言模型的发展极大地推动了机器阅读理解任务的进步。为了充分利用预训练语言模型中的浅层特征，并进一步提升问答模型预测答案的准确性，提出了一种基于BERT的三阶段式问答模型。首先，基于BERT设计了预回答、再回答及答案调整三个阶段；然后，在预回答阶段将BERT嵌入层的输入视作浅层特征来进行答案预生成；接着，在再回答阶段使用经BERT充分编码后的深层特征进行答案再生成；最后，在答案调整阶段结合前两个答案产生最终的预测结果。在抽取式问答任务的英文数据集SQuAD2.0和中文数据集CMRC2018上的实验结果显示，该模型在精准匹配度（EM）和F1分数（F1）两个指标上相较于同类基准模型平均提升了1~3个百分点，抽取出的答案片段更加准确。通过融合BERT中的浅层特征与深层特征，该三阶段模型拓展了BERT的抽象表示能力，探索了BERT中的浅层特征在问答模型中的应用，具有结构简单、预测准确、训练和推断速度快等特点。

关键词: 自然语言处理, 机器阅读理解, 抽取式问答, BERT, 深度学习

CLC Number:

TP391.1

Yu PENG, Xiaoyu LI, Shijie HU, Xiaolei LIU, Weizhong QIAN. Three-stage question answering model based on BERT[J]. Journal of Computer Applications, 2022, 42(1): 64-70.

彭宇, 李晓瑜, 胡世杰, 刘晓磊, 钱伟中. 基于BERT的三阶段式问答模型[J]. 《计算机应用》唯一官方网站, 2022, 42(1): 64-70.

Figures/Tables 10

References 22

1	HERMANN K M， KOČISKÝ T， GREFENSTETTE E， et al. Teaching machines to read and comprehend［C］// Proceedings of the 28th International Conference on Neural Information Processing Systems. Cambridge： MIT Press， 2015： 1693-1701. 10.18653/v1/d16-1116
2	CUI Y M， LIU T， CHE W X， et al. A span-extraction dataset for Chinese machine reading comprehension［C］// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Stroudsburg， PA： Association for Computational Linguistics， 2019： 5883-5889. 10.18653/v1/d19-1600
3	王小捷，白子薇，李可，等. 机器阅读理解的研究进展［J］. 北京邮电大学学报， 2019， 42（6）： 1-9. 10.13190/j.jbupt.2019-111
	WANG X J， BAI Z W， LI K， et al. Survey on machine reading comprehension［J］. Journal of Beijing University of Posts and Telecommunications， 2019， 42（6）： 1-9. 10.13190/j.jbupt.2019-111
4	RAJPURKAR P， ZHANG J， LOPYREV K， et al. SQuAD： 100，000+ questions for machine comprehension of text［C］// Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Stroudsburg， PA： Association for Computational Linguistics， 2016： 2383-2392. 10.18653/v1/d16-1264
5	KADLEC R， SCHMID M， BAJGAR O， et al. Text understanding with the attention sum reader network［C］// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics （Volume 1： Long Papers）. Stroudsburg， PA： Association for Computational Linguistics， 2016： 908-918. 10.18653/v1/p16-1086
6	SEO M， KEMBHAVI A， FARHADI A， et al. Bi-directional attention flow for machine comprehension［EB/OL］. （2018-06-21）［2020-12-22］.. 10.1109/cvpr.2017.571
7	DHINGRA B， LIU H X， YANG Z L， et al. Gated-attention readers for text comprehension［C］// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics （Volume 1： Long Papers）. Stroudsburg， PA： Association for Computational Linguistics， 2017： 1832-1846. 10.18653/v1/p17-1168
8	CUI Y M， CHEN Z P， WEI S， et al. Attention-over-attention neural networks for reading comprehension［C］// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics （Volume 1： Long Papers）. Stroudsburg， PA： Association for Computational Linguistics， 2017： 593-602. 10.18653/v1/p17-1055
9	RAJPURKAR P， JIA R， LIANG P. Know what you don’t know： unanswerable questions for SQuAD［C］// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics （Volume 2： Short Papers）. Stroudsburg， PA： Association for Computational Linguistics， 2018： 784-789. 10.18653/v1/p18-2124
10	TRISCHLER A， WANG T， YUAN X D， et al. NewsQA： a machine comprehension dataset［C］// Proceedings of the 2nd Workshop on Representation Learning for NLP. Stroudsburg， PA： Association for Computational Linguistics， 2017： 191-200. 10.18653/v1/w17-2623
11	YU A W， DOHAN D， LUONG M T， et al. QANet： combining local convolution with global self-attention for reading comprehension［EB/OL］. （2018-04-23）［2020-12-22］..
12	WANG S H， JIANG J. Machine comprehension using match-LSTM and answer pointer［EB/OL］. （2016-11-07）［2020-12-22］.. 10.18653/v1/2020.findings-emnlp.370
13	PENNINGTON J， SOCHER R， MANNING C D. GloVe： global vectors for word representation［C］// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg， PA： Association for Computational Linguistics， 2014： 1532-1543. 10.3115/v1/d14-1162
14	LE Q， MIKOLOV T. Distributed representations of sentences and documents［C］// Proceedings of the 31st International Conference on Machine Learning. New York： JMLR.org， 2014： 1188-1196.
15	PETERS M E， NEUMANN M， IYYER M， et al. Deep contextualized word representations［C］// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies， Volume 1 （Long Papers）. Stroudsburg， PA： Association for Computational Linguistics， 2018： 2227-2237. 10.18653/v1/n18-1202
16	DEVLIN J， CHANG M W， LEE K， et al. BERT： pre-training of deep bidirectional transformers for language understanding［C］// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies， Volume 1 （Long and Short Papers）. Stroudsburg， PA： Association for Computational Linguistics， 2019： 4171-4186. 10.18653/v1/n19-1423
17	LAN Z Z， CHEN M D， GOODMAN S， et al. ALBERT： a lite BERT for self-supervised learning of language representations［EB/OL］. （2020-02-09）［2020-12-22］..
18	HU M H， WEI F R， PENG Y X， et al. Read+ verify： machine reading comprehension with unanswerable questions［C］// Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2019： 6529-6537. 10.1609/aaai.v33i01.33016529
19	ZHANG Z S， YANG J J， ZHAO H. Retrospective reader for machine reading comprehension［C］// Proceedings of the 35th AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2021：14506-14514. 10.1609/aaai.v34i05.6511
20	CAI J， ZHU Z Z， NIE P， et al. A pairwise probe for understanding BERT fine-tuning on machine reading comprehension［C］// Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York： ACM， 2020： 1665-1668. 10.1145/3397271.3401195
21	SUN C， QIU X P， XU Y G， et al. How to fine-tune BERT for text classification？［C］// Proceedings of the 18th China National Conference on Chinese Computational Linguistics， LNCS11856. Cham： Springer， 2019： 194-206.
22	VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2017： 6000-6010. 10.1016/s0262-4079(17)32358-8

数据集	语言类型	训练集样本数	测试集样本数
SQuAD2.0	英文	130 319	11 873
CMRC2018	中文	10 321	3 351

数据集	语言类型	训练集样本数	测试集样本数
SQuAD2.0	英文	130 319	11 873
CMRC2018	中文	10 321	3 351

参数	值
epochs	4
batch_size	24
max_seq_length	368
dropout	0.1
learning rate	0.000 05
warm-up rate	0.1

参数	值
epochs	4
batch_size	24
max_seq_length	368
dropout	0.1
learning rate	0.000 05
warm-up rate	0.1

模型	V1.1		V2.0
模型	EM	F1	EM	F1
人类表现	80.3	90.5	86.3	89.0
BiDAF	67.7	77.3	57.7	62.3
Match-LSTM	67.6	76.8	60.3	63.5
SAN	75.6	84.8	67.9	70.7
QANet	73.6	82.7	62.5	66.4
BERT_base	80.8	88.5	74.4	77.1
+BiDAF	81.9	89.0	74.0	76.9
+SAN	82.2	89.6	74.9	77.6
+本文模型	82.8	88.9	76.8	78.7

Three-stage question answering model based on BERT

基于BERT的三阶段式问答模型

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 10

References 22

Related Articles 15

Recommended Articles

Metrics

模型	EM	F1
人类表现	91.08	97.35
T-Reader	39.43	62.41
SXU-Reader	40.29	66.45
R-NET	45.42	69.83
GM-Reader	56.32	77.41
MCA-Reader	63.90	82.62
BERT_base	63.60	83.90
+本文模型	65.00	85.10

[1]	Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974.
[2]	Xiyuan WANG, Zhancheng ZHANG, Shaokang XU, Baocheng ZHANG, Xiaoqing LUO, Fuyuan HU. Unsupervised cross-domain transfer network for 3D/2D registration in surgical navigation [J]. Journal of Computer Applications, 2024, 44(9): 2911-2918.
[3]	Shunyong LI, Shiyi LI, Rui XU, Xingwang ZHAO. Incomplete multi-view clustering algorithm based on self-attention fusion [J]. Journal of Computer Applications, 2024, 44(9): 2696-2703.
[4]	Yunchuan HUANG, Yongquan JIANG, Juntao HUANG, Yan YANG. Molecular toxicity prediction based on meta graph isomorphism network [J]. Journal of Computer Applications, 2024, 44(9): 2964-2969.
[5]	Yexin PAN, Zhe YANG. Optimization model for small object detection based on multi-level feature bidirectional fusion [J]. Journal of Computer Applications, 2024, 44(9): 2871-2877.
[6]	Qi SHUAI, Hairui WANG, Guifu ZHU. Chinese story ending generation model based on bidirectional contrastive training [J]. Journal of Computer Applications, 2024, 44(9): 2683-2688.
[7]	Quanmei ZHANG, Runping HUANG, Fei TENG, Haibo ZHANG, Nan ZHOU. Automatic international classification of disease coding method incorporating heterogeneous information [J]. Journal of Computer Applications, 2024, 44(8): 2476-2482.
[8]	Yuhan LIU, Genlin JI, Hongping ZHANG. Video pedestrian anomaly detection method based on skeleton graph and mixed attention [J]. Journal of Computer Applications, 2024, 44(8): 2551-2557.
[9]	Yanjie GU, Yingjun ZHANG, Xiaoqian LIU, Wei ZHOU, Wei SUN. Traffic flow forecasting via spatial-temporal multi-graph fusion [J]. Journal of Computer Applications, 2024, 44(8): 2618-2625.
[10]	Qianhong SHI, Yan YANG, Yongquan JIANG, Xiaocao OUYANG, Wubo FAN, Qiang CHEN, Tao JIANG, Yuan LI. Multi-granularity abrupt change fitting network for air quality prediction [J]. Journal of Computer Applications, 2024, 44(8): 2643-2650.
[11]	Qing LIU, Yanping CHEN, Anqi ZOU, Ruizhang HUANG, Yongbin QIN. Boundary-aware approach to machine reading comprehension [J]. Journal of Computer Applications, 2024, 44(7): 2004-2010.
[12]	Zheng WU, Zhiyou CHENG, Zhentian WANG, Chuanjian WANG, Sheng WANG, Hui XU. Deep learning-based classification of head movement amplitude during patient anaesthesia resuscitation [J]. Journal of Computer Applications, 2024, 44(7): 2258-2263.
[13]	Huanhuan LI, Tianqiang HUANG, Xuemei DING, Haifeng LUO, Liqing HUANG. Public traffic demand prediction based on multi-scale spatial-temporal graph convolutional network [J]. Journal of Computer Applications, 2024, 44(7): 2065-2072.
[14]	Zhi ZHANG, Xin LI, Naifu YE, Kaixi HU. DKP： defending against model stealing attacks based on dark knowledge protection [J]. Journal of Computer Applications, 2024, 44(7): 2080-2086.
[15]	Yiqun ZHAO, Zhiyu ZHANG, Xue DONG. Anisotropic travel time computation method based on dense residual connection physical information neural networks [J]. Journal of Computer Applications, 2024, 44(7): 2310-2318.