基于池化和特征组合增强BERT的答案选择模型

doi:10.11772/j.issn.1001-9081.2021122167

《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (2): 365-373.DOI: 10.11772/j.issn.1001-9081.2021122167

• 人工智能 • 上一篇

基于池化和特征组合增强BERT的答案选择模型

胡婕¹^,²(), 陈晓茜¹, 张龑¹^,²

^1.湖北大学计算机与信息工程学院，武汉 430062
^2.湖北省教育信息化工程技术研究中心（湖北大学），武汉 430062

收稿日期:2021-12-29 修回日期:2022-06-04 接受日期:2022-06-10 发布日期:2022-06-30 出版日期:2023-02-10
通讯作者: 胡婕
作者简介:陈晓茜（1997—），女，河南平顶山人，硕士研究生，主要研究方向：自然语言处理
张龑（1974—），男，湖北宜昌人，教授，博士，CCF会员，主要研究方向：软件工程、信息安全。
基金资助:
国家自然科学基金资助项目(61977021)

Answer selection model based on pooling and feature combination enhanced BERT

Jie HU¹^,²(), Xiaoxi CHEN¹, Yan ZHANG¹^,²

^1.School of Computer Science and Information Engineering，Hubei University，Wuhan Hubei 430062，China
^2.Hubei Engineering Technology Research Center for Educational Informatization （Hubei University），Wuhan Hubei 430062，China

Received:2021-12-29 Revised:2022-06-04 Accepted:2022-06-10 Online:2022-06-30 Published:2023-02-10
Contact: Jie HU
About author:CHEN Xiaoxi， born in 1997， M. S. candidate. Her research interests include natural language processing.
ZHANG Yan， born in 1974， Ph. D.， professor. His research interests include software engineering， information security.
Supported by:
National Natural Science Foundation of China(61977021)

摘要/Abstract

摘要：

当前主流模型无法充分地表示问答对的语义，未充分考虑问答对主题信息间的联系并且激活函数存在软饱和的问题，而这些会影响模型的整体性能。针对这些问题，提出了一种基于池化和特征组合增强BERT的答案选择模型。首先，在预训练模型BERT的基础上增加对抗样本并引入池化操作来表示问答对的语义；其次，引入主题信息特征组合来加强问答对主题信息间的联系；最后，改进隐藏层的激活函数，并用拼接向量通过隐藏层和分类器完成答案选择任务。在SemEval-2016CQA和SemEval-2017CQA数据集上进行的验证结果表明，所提模型与tBERT模型相比，准确率分别提高了3.1个百分点和2.2个百分点；F1值分别提高了2.0个百分点和3.1个百分点。可见，所提模型在答案选择任务上的综合效果得到了有效提升，准确率和F1值均优于对比模型。

关键词: 答案选择, 预训练模型, 池化, 特征组合, 激活函数

Abstract:

Current main stream models cannot fully express the semantics of question and answer pairs， do not fully consider the relationships between the topic information of question and answer pairs， and the activation function has the problem of soft saturation， which affect the overall performance of the model. To solve these problems， an answer selection model based on pooling and feature combination enhanced BERT （Bi-directional Encoder Representations from Transformers） was proposed. Firstly， adversarial samples and pooling operation were introduced to represent the semantics of question and answer pairs based on the pre-training model BERT. Secondly， the relationships between topic information of question and answer pairs were strengthened by the feature combination of topic information. Finally， the activation function in the hidden layer was improved， and the splicing vector was used to complete the answer selection task through the hidden layer and classifier. Model validation was performed on datasets SemEval-2016CQA and SemEval-2017CQA. The results show that compared with tBERT model， the proposed model has the accuracy increased by 3.1 percentage points and 2.2 percentage points respectively， F1 score increased by 2.0 percentage points and 3.1 percentage points respectively. It can be seen that the comprehensive effect of the proposed model on the answer selection task is effectively improved， and both of the accuracy and F1 score of the model are better than those of the model for comparison.

Key words: answer selection, pre-training model, pooling, feature combination, activation function

中图分类号:

TP391.1

胡婕, 陈晓茜, 张龑. 基于池化和特征组合增强BERT的答案选择模型[J]. 计算机应用, 2023, 43(2): 365-373.

Jie HU, Xiaoxi CHEN, Yan ZHANG. Answer selection model based on pooling and feature combination enhanced BERT[J]. Journal of Computer Applications, 2023, 43(2): 365-373.

图/表 12

图1 本文模型结构

Fig. 1 Structure of proposed model

图2 BERT模型结构

Fig. 2 Structure of BERT model

图3 LDA模型结构

Fig. 3 Structure of LDA model

表1 数据集描述

Tab. 1 Description of datasets

数据集	样本数			文本的平均长度
数据集	训练集	验证集	测试集	文本的平均长度
SemEval-2016CQA	20 340	2 440	3 270	42
SemEval-2017CQA	14 110	2 440	2 930	46
MSRP	3 576	500	1 725	18

表2 参数设置

Tab. 2 Parameter setting

参数	值	参数	值
learning-rate	$3 E - 5$	batch_size	16
optimization	Adam	numbers of topics	70
epochs	3	LDA alpha values	50
hidden_size	768

表2 参数设置

Tab. 2 Parameter setting

参数	值	参数	值
learning-rate	$3 E - 5$	batch_size	16
optimization	Adam	numbers of topics	70
epochs	3	LDA alpha values	50
hidden_size	768

表3 tBERT、tBERT-AT、tBERT-pooling和tBERT-AT-pooling模型的准确率和F1值的对比 ( %)

Tab. 3 Comparison of accuracy and F1 scores of tBERT，tBERT-AT，tBERT-pooling， and tBERT-AT-pooling models

模型	SemEval-2016CQA		SemEval-2017CQA
模型	准确率	F1	准确率	F1
tBERT	77.6	74.1	78.3	76.8
tBERT-AT	78.6	74.9	79.4	77.9
tBERT-pooling	78.0	74.4	78.6	77.2
tBERT-AT-pooling	78.8	75.3	79.6	78.1

表4 tBERT、tBERT-AT、tBERT-pooling以及tBERT-AT-pooling模型引入主题信息特征组合前后的准确率和F1值的对比 ( %)

Tab. 4 Comparison of accuracy and F1 scores of tBERT， tBERT-AT，tBERT-pooling and tBERT-AT-pooling models before and after introducing combination of topic information features

模型	SemEval-2016CQA		SemEval-2017CQA
模型	准确率	F1	准确率	F1
tBERT	77.6	74.1	78.3	76.8
tBERT-特征组合	77.9	74.3	78.5	77.0
tBERT-AT	78.6	74.9	79.4	77.9
tBERT-AT-特征组合	78.9	75.1	79.5	78.1
tBERT-pooling	78.0	74.4	78.6	77.2
tBERT-pooling-特征组合	78.4	74.7	78.8	77.5
tBERT-AT-pooling	78.8	75.3	79.6	78.1
tBERT-AT-pooling-特征组合	79.2	75.6	79.9	78.6

表5 tBERT、tBERT-AT-特征组合、tBERT-pooling-特征组合以及tBERT-AT-pooling-特征组合模型改进激活函数前后的准确率和F1值的对比 ( %)

Tab. 5 Comparison of accuracy and F1 scores of tBERT， tBERT-AT-feature combination， tBERT-pooling-feature combination and tBERT-AT-pooling-feature combination models before and after improving activation function

模型	SemEval-2016CQA		SemEval-017CQA
模型	准确率	F1	准确率	F1
tBERT-tanh	77.6	74.1	78.3	76.8
tBERT-改进的激活函数	78.5	74.3	79.0	77.3
tBERT-AT-特征组合-tanh	78.9	75.1	79.5	78.1
tBERT-AT-特征组合-改进的激活函数	79.1	75.3	79.7	78.4
tBERT-pooling-特征组合-tanh	78.4	74.7	78.8	77.5
tBERT-pooling-特征组合-改进的激活函数	79.3	75.6	80.1	78.2
tBERT-AT-pooling-特征组合-tanh	79.2	75.6	79.9	78.6
本文模型	80.7	76.1	80.5	79.9

表6 tBERT改进激活函数前后在MSRP数据集上的准确率和F1值的对比 ( %)

Tab. 6 Comparison of accuracy and F1 scores of tBERT，tBERT before and after improving activation function on MSRP dataset

模型	MSRP
模型	准确率	F1
tBERT-tanh	89.5	88.4
tBERT-改进后的激活函数	89.8	88.6

表7 相关模型准确率和F1值的对比 ( %)

Tab. 7 Comparison of accuracy and F1 scores of related models

模型	SemEval-2016CQA		SemEval-2017CQA
模型	准确率	F1	准确率	F1
LDA主题模型	70.3	67.6	71.4	68.4
ECNU	74.3	66.7	78.4	77.6
Siamese-BiLSTM	74.6	68.7	75.3	67.1
UIA-LSTM-CNN	78.2	68.4	77.1	76.4
AUANN	80.5	74.5	78.5	79.8
BERT	75.6	71.9	76.2	70.4
GMN-BERT	76.7	72.8	77.5	71.6
BERT-pooling	76.1	72.5	77.1	71.1
tBERT	77.6	74.1	78.3	76.8
本文模型	80.7	76.1	80.5	79.9

表8 tBERT模型与本文模型对同一例子的注意力可视化对比

Tab.8 Comparison of attention visualization to the same example between tBERT and proposed model

模型	注意力可视化示例
tBERT模型^［15］	问题：How much salary？ Hi everyone I’m in the process of negotiating my salary but I have no idea how much should be the salary of mechanical engineer with grade 5 in a government company and the benefits. This will be my first time in Qatar. Kindly help me. Thanks in advance.
本文模型	问题：How much salary？ Hi everyone I’m in the process of negotiating my salary but I have no idea how much should be the salary of mechanical engineer with grade 5 in a government company and the benefits. This will be my first time in Qatar. Kindly help me. Thanks in advance.

表9 tBERT模型与本文模型对同一问题的预测答案的对比

Tab.9 Comparison of answers to the same question predicted by tBERT and proposed model

问题	答案
问题	tBERT模型^［15］	本文模型
How much salary？ Hi everyone I’m in the process of negotiating my salary but I have no idea how much should be the salary of mechanical engineer with grade 5 in a government company and the benefits. This will be my first time in Qatar. Kindly help me. Thanks in advance.	Hey； I am a Mechanical Engineer as well and working in Qatar. You can email me and we can discus it further.	That should be around 12-15 and you should get free government housing and a 3 000 mobile and internet allowance. That’s it.

参考文献 19

1	ASKAR M T R， HUANG J X， HOQUE E. Contextualized embeddings based transformer encoder for sentence similarity modeling in answer selection task［C］// Proceedings of the 12th Language Resources and Evaluation Conference. ［S.l.］： European Language Resources Association， 2020： 5505-5514.
2	YANG L， AI Q Y， GUO J F， et al. aNMM： ranking short answer texts with attention-based neural matching model［C］// Proceedings of the 25th ACM International Conference on Information and Knowledge Management. New York： ACM， 2016： 287-296. 10.1145/2983323.2983818
3	YANG R Q， ZHANG J H， GAO X， et al. Simple and effective text matching with richer alignment features［C］// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg， PA： ACL， 2019：4699-4709. 10.18653/v1/p19-1465
4	NECULOIU P， VERSTEEGH M， ROTARU M. Learning text similarity with Siamese recurrent networks［C］// Proceedings of the 1st Workshop on Representation Learning for NLP. Stroudsburg， PA： ACL， 2016： 148-157. 10.18653/v1/w16-1617
5	BLEI D M， NG A Y， JORDAN M I. Latent Dirichlet allocation［J］. The Journal of Machine Learning Research， 2003， 3： 993-1022.
6	MIHAYLOV T， NAKOV P. SemanticZ at SemEval-2016 Task 3： ranking relevant answers in community question answering using semantic similarity based on fine-tuned word embeddings［C］// Proceedings of the 10th International Workshop on Semantic Evaluation. Stroudsburg， PA： ACL， 2016： 879-886. 10.18653/v1/s16-1136
7	WU G S， SHENG Y X， LAN M， et al. ECNU at SemEval-2017 task 3： using traditional and deep learning methods to address community question answering task［C］// Proceedings of the 11th International Workshop on Semantic Evaluation. Stroudsburg， PA： ACL， 2017： 365-369. 10.18653/v1/s17-2060
8	WEN J H， MA J W， FENG Y L， et al. Hybrid attentive answer selection in CQA with deep users modelling［C］// Proceedings of the 32nd AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2018： 2556-2563. 10.1609/aaai.v32i1.11840
9	XIE Y X， SHEN Y， LI Y L， et al. Attentive user-engaged adversarial neural network for community question answering［C］// Proceedings of the 34th AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2020：9322-9329. 10.1609/aaai.v34i05.6472
10	MIKOLOV T， CHEN K， CORRADO G， et al. Efficient estimation of word representations in vector space［EB/OL］. （2013-09-07）［2021-01-06］.. 10.3126/jiee.v3i1.34327
11	PENNINGTON J， SOCHER R， MANNING C D. GloVe： global vectors for word representation［C］// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg， PA： ACL， 2014： 1532-1543. 10.3115/v1/d14-1162
12	DEVLIN J， CHANG M W， LEE K， et al. BERT： pre-training of deep bidirectional transformers for language understanding［C］// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies， Volume 1 （Long and Short Papers）. Stroudsburg， PA： ACL， 2019：4171-4186. 10.18653/v1/n18-2
13	LASKAR M T R， HOQUE E， HUANG J X. Utilizing bidirectional encoder representations from transformers for answer selection［C］// Proceedings of the 2019 International Conference on Applied Mathematics， Modeling and Computational Science， PROMS 343. Cham： Springer， 2021： 693-703. 10.1007/978-3-030-63591-6_63
14	CHEN L， ZHAO Y B， LV B， et al. Neural graph matching networks for Chinese short text matching［C］// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg， PA： ACL， 2020：6152-6158. 10.18653/v1/2020.acl-main.547
15	PEINELT N， NGUYEN D， LIAKATA M. tBERT： topic models and BERT joining forces for semantic similarity detection［C］// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg， PA： ACL， 2020： 7047-7055. 10.18653/v1/2020.acl-main.630
16	REIMERS N， GUREVYCH I. Sentence-BERT： sentence embeddings using Siamese BERT-networks［EB/OL］. （2019-08-27）［2020-03-24］.. 10.18653/v1/d19-1410
17	栾克鑫，杜新凯，孙承杰，等. 基于注意力机制的句子排序方法［J］. 中文信息学报， 2018， 32（1）：123-130. 10.3969/j.issn.1003-0077.2018.01.016
	LUAN K X， DU X K， SUN C J， et al. Sentence ordering based on attention mechanism［J］. Journal of Chinese Information Processing， 2018， 32（1）：123-130. 10.3969/j.issn.1003-0077.2018.01.016
18	NAKOV P， MÀRQUEZ L， MOSHITTI A， et al. SemEval-2016 Task 3： community question answering［C］// Proceedings of the 10th International Workshop on Semantic Evaluation. Stroudsburg， PA： ACL， 2016：525-545. 10.18653/v1/s16-1083
19	NAKOV P， HOOGEVEEN D， MÀRQUEZ L， et al. SemEval-2017 Task 3： community question answering［C］// Proceedings of the 11th International Workshop on Semantic Evaluation. Stroudsburg， PA： ACL， 2017：27-48. 10.18653/v1/s17-2003

[1]	徐铭, 李林昊, 齐巧玲, 王利琴. 基于注意力平衡列表的溯因推理模型[J]. 《计算机应用》唯一官方网站, 2023, 43(2): 349-355.
[2]	胡婕, 胡燕, 刘梦赤, 张龑. 基于知识库实体增强BERT模型的中文命名实体识别[J]. 《计算机应用》唯一官方网站, 2022, 42(9): 2680-2685.
[3]	王宇航, 周永霞, 吴良武. 基于高斯函数的池化算法[J]. 《计算机应用》唯一官方网站, 2022, 42(9): 2800-2806.
[4]	马峻, 姚震, 徐翠锋, 陈寿宏. 基于改进PP-YOLO和Deep-SORT的多无人机实时跟踪算法[J]. 《计算机应用》唯一官方网站, 2022, 42(9): 2885-2892.
[5]	李昊, 陈艳平, 唐瑞雪, 黄瑞章, 秦永彬, 王国蓉, 谭曦. 基于实体边界组合的关系抽取方法[J]. 《计算机应用》唯一官方网站, 2022, 42(6): 1796-1801.
[6]	边小勇, 费雄君, 陈春芳, 阚东东, 丁胜. 联合一二阶池化网络学习的遥感场景分类[J]. 《计算机应用》唯一官方网站, 2022, 42(6): 1972-1978.
[7]	乔桂芳, 侯守明, 刘彦彦. 基于改进卷积神经网络与支持向量机结合的面部表情识别算法[J]. 《计算机应用》唯一官方网站, 2022, 42(4): 1253-1259.
[8]	李亚鸣, 邢凯, 邓洪武, 王志勇, 胡璇. 基于小样本无梯度学习的卷积结构预训练模型性能优化方法[J]. 《计算机应用》唯一官方网站, 2022, 42(2): 365-374.
[9]	张宇豪, 袁孟雯, 陆宇婧, 燕锐, 唐华锦. 面向动态事件流的神经网络转换方法[J]. 《计算机应用》唯一官方网站, 2022, 42(10): 3033-3039.
[10]	马佳良, 陈斌, 孙晓飞. 基于改进的Faster R-CNN的通用目标检测框架[J]. 计算机应用, 2021, 41(9): 2712-2719.
[11]	冯兴杰, 张天泽. 基于分组卷积进行特征融合的全景分割算法[J]. 计算机应用, 2021, 41(7): 2054-2061.
[12]	武光利, 李雷霆, 郭振洲, 王成祥. 基于改进的双向长短期记忆网络的视频摘要生成模型[J]. 计算机应用, 2021, 41(7): 1908-1914.
[13]	贾鹤鸣, 郎春博, 姜子超. 基于轻量级卷积神经网络的植物叶片病害识别方法[J]. 计算机应用, 2021, 41(6): 1812-1819.
[14]	刘睿珩, 叶霞, 岳增营. 面向自然语言处理任务的预训练模型综述[J]. 《计算机应用》唯一官方网站, 2021, 41(5): 1236-1246.
[15]	李慧慧, 闫坤, 张李轩, 刘威, 李执. 基于MobileNetV2的圆形指针式仪表识别系统[J]. 计算机应用, 2021, 41(4): 1214-1220.

基于池化和特征组合增强BERT的答案选择模型

Answer selection model based on pooling and feature combination enhanced BERT

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 12

参考文献 19

相关文章 15

编辑推荐

Metrics