基于池化和特征组合增强BERT的答案选择模型

doi:10.11772/j.issn.1001-9081.2021122167

《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (2): 365-373.DOI: 10.11772/j.issn.1001-9081.2021122167

所属专题：人工智能

基于池化和特征组合增强BERT的答案选择模型

胡婕¹^,²(), 陈晓茜¹, 张龑¹^,²

^1.湖北大学计算机与信息工程学院，武汉 430062
^2.湖北省教育信息化工程技术研究中心（湖北大学），武汉 430062

收稿日期:2021-12-29 修回日期:2022-06-04 接受日期:2022-06-10 发布日期:2022-06-30 出版日期:2023-02-10
通讯作者: 胡婕
作者简介:陈晓茜（1997—），女，河南平顶山人，硕士研究生，主要研究方向：自然语言处理
张龑（1974—），男，湖北宜昌人，教授，博士，CCF会员，主要研究方向：软件工程、信息安全。
基金资助:
国家自然科学基金资助项目(61977021)

Answer selection model based on pooling and feature combination enhanced BERT

Jie HU¹^,²(), Xiaoxi CHEN¹, Yan ZHANG¹^,²

^1.School of Computer Science and Information Engineering，Hubei University，Wuhan Hubei 430062，China
^2.Hubei Engineering Technology Research Center for Educational Informatization （Hubei University），Wuhan Hubei 430062，China

Received:2021-12-29 Revised:2022-06-04 Accepted:2022-06-10 Online:2022-06-30 Published:2023-02-10
Contact: Jie HU
About author:CHEN Xiaoxi， born in 1997， M. S. candidate. Her research interests include natural language processing.
ZHANG Yan， born in 1974， Ph. D.， professor. His research interests include software engineering， information security.
Supported by:
National Natural Science Foundation of China(61977021)

摘要/Abstract

摘要：

当前主流模型无法充分地表示问答对的语义，未充分考虑问答对主题信息间的联系并且激活函数存在软饱和的问题，而这些会影响模型的整体性能。针对这些问题，提出了一种基于池化和特征组合增强BERT的答案选择模型。首先，在预训练模型BERT的基础上增加对抗样本并引入池化操作来表示问答对的语义；其次，引入主题信息特征组合来加强问答对主题信息间的联系；最后，改进隐藏层的激活函数，并用拼接向量通过隐藏层和分类器完成答案选择任务。在SemEval-2016CQA和SemEval-2017CQA数据集上进行的验证结果表明，所提模型与tBERT模型相比，准确率分别提高了3.1个百分点和2.2个百分点；F1值分别提高了2.0个百分点和3.1个百分点。可见，所提模型在答案选择任务上的综合效果得到了有效提升，准确率和F1值均优于对比模型。

关键词: 答案选择, 预训练模型, 池化, 特征组合, 激活函数

Abstract:

Current main stream models cannot fully express the semantics of question and answer pairs， do not fully consider the relationships between the topic information of question and answer pairs， and the activation function has the problem of soft saturation， which affect the overall performance of the model. To solve these problems， an answer selection model based on pooling and feature combination enhanced BERT （Bi-directional Encoder Representations from Transformers） was proposed. Firstly， adversarial samples and pooling operation were introduced to represent the semantics of question and answer pairs based on the pre-training model BERT. Secondly， the relationships between topic information of question and answer pairs were strengthened by the feature combination of topic information. Finally， the activation function in the hidden layer was improved， and the splicing vector was used to complete the answer selection task through the hidden layer and classifier. Model validation was performed on datasets SemEval-2016CQA and SemEval-2017CQA. The results show that compared with tBERT model， the proposed model has the accuracy increased by 3.1 percentage points and 2.2 percentage points respectively， F1 score increased by 2.0 percentage points and 3.1 percentage points respectively. It can be seen that the comprehensive effect of the proposed model on the answer selection task is effectively improved， and both of the accuracy and F1 score of the model are better than those of the model for comparison.

Key words: answer selection, pre-training model, pooling, feature combination, activation function

中图分类号:

TP391.1

胡婕, 陈晓茜, 张龑. 基于池化和特征组合增强BERT的答案选择模型[J]. 计算机应用, 2023, 43(2): 365-373.

Jie HU, Xiaoxi CHEN, Yan ZHANG. Answer selection model based on pooling and feature combination enhanced BERT[J]. Journal of Computer Applications, 2023, 43(2): 365-373.

图/表 12

图1 本文模型结构

Fig. 1 Structure of proposed model

图2 BERT模型结构

Fig. 2 Structure of BERT model

图3 LDA模型结构

Fig. 3 Structure of LDA model

表1 数据集描述

Tab. 1 Description of datasets

数据集	样本数			文本的平均长度
数据集	训练集	验证集	测试集	文本的平均长度
SemEval-2016CQA	20 340	2 440	3 270	42
SemEval-2017CQA	14 110	2 440	2 930	46
MSRP	3 576	500	1 725	18

表2 参数设置

Tab. 2 Parameter setting

参数	值	参数	值
learning-rate	$3 E - 5$	batch_size	16
optimization	Adam	numbers of topics	70
epochs	3	LDA alpha values	50
hidden_size	768

表2 参数设置

Tab. 2 Parameter setting

参数	值	参数	值
learning-rate	$3 E - 5$	batch_size	16
optimization	Adam	numbers of topics	70
epochs	3	LDA alpha values	50
hidden_size	768

表3 tBERT、tBERT-AT、tBERT-pooling和tBERT-AT-pooling模型的准确率和F1值的对比 ( %)

Tab. 3 Comparison of accuracy and F1 scores of tBERT，tBERT-AT，tBERT-pooling， and tBERT-AT-pooling models

模型	SemEval-2016CQA		SemEval-2017CQA
模型	准确率	F1	准确率	F1
tBERT	77.6	74.1	78.3	76.8
tBERT-AT	78.6	74.9	79.4	77.9
tBERT-pooling	78.0	74.4	78.6	77.2
tBERT-AT-pooling	78.8	75.3	79.6	78.1

表4 tBERT、tBERT-AT、tBERT-pooling以及tBERT-AT-pooling模型引入主题信息特征组合前后的准确率和F1值的对比 ( %)

Tab. 4 Comparison of accuracy and F1 scores of tBERT， tBERT-AT，tBERT-pooling and tBERT-AT-pooling models before and after introducing combination of topic information features

模型	SemEval-2016CQA		SemEval-2017CQA
模型	准确率	F1	准确率	F1
tBERT	77.6	74.1	78.3	76.8
tBERT-特征组合	77.9	74.3	78.5	77.0
tBERT-AT	78.6	74.9	79.4	77.9
tBERT-AT-特征组合	78.9	75.1	79.5	78.1
tBERT-pooling	78.0	74.4	78.6	77.2
tBERT-pooling-特征组合	78.4	74.7	78.8	77.5
tBERT-AT-pooling	78.8	75.3	79.6	78.1
tBERT-AT-pooling-特征组合	79.2	75.6	79.9	78.6

表5 tBERT、tBERT-AT-特征组合、tBERT-pooling-特征组合以及tBERT-AT-pooling-特征组合模型改进激活函数前后的准确率和F1值的对比 ( %)

Tab. 5 Comparison of accuracy and F1 scores of tBERT， tBERT-AT-feature combination， tBERT-pooling-feature combination and tBERT-AT-pooling-feature combination models before and after improving activation function

模型	SemEval-2016CQA		SemEval-017CQA
模型	准确率	F1	准确率	F1
tBERT-tanh	77.6	74.1	78.3	76.8
tBERT-改进的激活函数	78.5	74.3	79.0	77.3
tBERT-AT-特征组合-tanh	78.9	75.1	79.5	78.1
tBERT-AT-特征组合-改进的激活函数	79.1	75.3	79.7	78.4
tBERT-pooling-特征组合-tanh	78.4	74.7	78.8	77.5
tBERT-pooling-特征组合-改进的激活函数	79.3	75.6	80.1	78.2
tBERT-AT-pooling-特征组合-tanh	79.2	75.6	79.9	78.6
本文模型	80.7	76.1	80.5	79.9

表6 tBERT改进激活函数前后在MSRP数据集上的准确率和F1值的对比 ( %)

Tab. 6 Comparison of accuracy and F1 scores of tBERT，tBERT before and after improving activation function on MSRP dataset

模型	MSRP
模型	准确率	F1
tBERT-tanh	89.5	88.4
tBERT-改进后的激活函数	89.8	88.6

表7 相关模型准确率和F1值的对比 ( %)

Tab. 7 Comparison of accuracy and F1 scores of related models

模型	SemEval-2016CQA		SemEval-2017CQA
模型	准确率	F1	准确率	F1
LDA主题模型	70.3	67.6	71.4	68.4
ECNU	74.3	66.7	78.4	77.6
Siamese-BiLSTM	74.6	68.7	75.3	67.1
UIA-LSTM-CNN	78.2	68.4	77.1	76.4
AUANN	80.5	74.5	78.5	79.8
BERT	75.6	71.9	76.2	70.4
GMN-BERT	76.7	72.8	77.5	71.6
BERT-pooling	76.1	72.5	77.1	71.1
tBERT	77.6	74.1	78.3	76.8
本文模型	80.7	76.1	80.5	79.9

表8 tBERT模型与本文模型对同一例子的注意力可视化对比

Tab.8 Comparison of attention visualization to the same example between tBERT and proposed model

模型	注意力可视化示例
tBERT模型^［15］	问题：How much salary？ Hi everyone I’m in the process of negotiating my salary but I have no idea how much should be the salary of mechanical engineer with grade 5 in a government company and the benefits. This will be my first time in Qatar. Kindly help me. Thanks in advance.
本文模型	问题：How much salary？ Hi everyone I’m in the process of negotiating my salary but I have no idea how much should be the salary of mechanical engineer with grade 5 in a government company and the benefits. This will be my first time in Qatar. Kindly help me. Thanks in advance.

表9 tBERT模型与本文模型对同一问题的预测答案的对比

Tab.9 Comparison of answers to the same question predicted by tBERT and proposed model

问题	答案
问题	tBERT模型^［15］	本文模型
How much salary？ Hi everyone I’m in the process of negotiating my salary but I have no idea how much should be the salary of mechanical engineer with grade 5 in a government company and the benefits. This will be my first time in Qatar. Kindly help me. Thanks in advance.	Hey； I am a Mechanical Engineer as well and working in Qatar. You can email me and we can discus it further.	That should be around 12-15 and you should get free government housing and a 3 000 mobile and internet allowance. That’s it.

参考文献 19

1	ASKAR M T R， HUANG J X， HOQUE E. Contextualized embeddings based transformer encoder for sentence similarity modeling in answer selection task［C］// Proceedings of the 12th Language Resources and Evaluation Conference. ［S.l.］： European Language Resources Association， 2020： 5505-5514.
2	YANG L， AI Q Y， GUO J F， et al. aNMM： ranking short answer texts with attention-based neural matching model［C］// Proceedings of the 25th ACM International Conference on Information and Knowledge Management. New York： ACM， 2016： 287-296. 10.1145/2983323.2983818
3	YANG R Q， ZHANG J H， GAO X， et al. Simple and effective text matching with richer alignment features［C］// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg， PA： ACL， 2019：4699-4709. 10.18653/v1/p19-1465
4	NECULOIU P， VERSTEEGH M， ROTARU M. Learning text similarity with Siamese recurrent networks［C］// Proceedings of the 1st Workshop on Representation Learning for NLP. Stroudsburg， PA： ACL， 2016： 148-157. 10.18653/v1/w16-1617
5	BLEI D M， NG A Y， JORDAN M I. Latent Dirichlet allocation［J］. The Journal of Machine Learning Research， 2003， 3： 993-1022.
6	MIHAYLOV T， NAKOV P. SemanticZ at SemEval-2016 Task 3： ranking relevant answers in community question answering using semantic similarity based on fine-tuned word embeddings［C］// Proceedings of the 10th International Workshop on Semantic Evaluation. Stroudsburg， PA： ACL， 2016： 879-886. 10.18653/v1/s16-1136
7	WU G S， SHENG Y X， LAN M， et al. ECNU at SemEval-2017 task 3： using traditional and deep learning methods to address community question answering task［C］// Proceedings of the 11th International Workshop on Semantic Evaluation. Stroudsburg， PA： ACL， 2017： 365-369. 10.18653/v1/s17-2060
8	WEN J H， MA J W， FENG Y L， et al. Hybrid attentive answer selection in CQA with deep users modelling［C］// Proceedings of the 32nd AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2018： 2556-2563. 10.1609/aaai.v32i1.11840
9	XIE Y X， SHEN Y， LI Y L， et al. Attentive user-engaged adversarial neural network for community question answering［C］// Proceedings of the 34th AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2020：9322-9329. 10.1609/aaai.v34i05.6472
10	MIKOLOV T， CHEN K， CORRADO G， et al. Efficient estimation of word representations in vector space［EB/OL］. （2013-09-07）［2021-01-06］.. 10.3126/jiee.v3i1.34327
11	PENNINGTON J， SOCHER R， MANNING C D. GloVe： global vectors for word representation［C］// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg， PA： ACL， 2014： 1532-1543. 10.3115/v1/d14-1162
12	DEVLIN J， CHANG M W， LEE K， et al. BERT： pre-training of deep bidirectional transformers for language understanding［C］// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies， Volume 1 （Long and Short Papers）. Stroudsburg， PA： ACL， 2019：4171-4186. 10.18653/v1/n18-2
13	LASKAR M T R， HOQUE E， HUANG J X. Utilizing bidirectional encoder representations from transformers for answer selection［C］// Proceedings of the 2019 International Conference on Applied Mathematics， Modeling and Computational Science， PROMS 343. Cham： Springer， 2021： 693-703. 10.1007/978-3-030-63591-6_63
14	CHEN L， ZHAO Y B， LV B， et al. Neural graph matching networks for Chinese short text matching［C］// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg， PA： ACL， 2020：6152-6158. 10.18653/v1/2020.acl-main.547
15	PEINELT N， NGUYEN D， LIAKATA M. tBERT： topic models and BERT joining forces for semantic similarity detection［C］// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg， PA： ACL， 2020： 7047-7055. 10.18653/v1/2020.acl-main.630
16	REIMERS N， GUREVYCH I. Sentence-BERT： sentence embeddings using Siamese BERT-networks［EB/OL］. （2019-08-27）［2020-03-24］.. 10.18653/v1/d19-1410
17	栾克鑫，杜新凯，孙承杰，等. 基于注意力机制的句子排序方法［J］. 中文信息学报， 2018， 32（1）：123-130. 10.3969/j.issn.1003-0077.2018.01.016
	LUAN K X， DU X K， SUN C J， et al. Sentence ordering based on attention mechanism［J］. Journal of Chinese Information Processing， 2018， 32（1）：123-130. 10.3969/j.issn.1003-0077.2018.01.016
18	NAKOV P， MÀRQUEZ L， MOSHITTI A， et al. SemEval-2016 Task 3： community question answering［C］// Proceedings of the 10th International Workshop on Semantic Evaluation. Stroudsburg， PA： ACL， 2016：525-545. 10.18653/v1/s16-1083
19	NAKOV P， HOOGEVEEN D， MÀRQUEZ L， et al. SemEval-2017 Task 3： community question answering［C］// Proceedings of the 11th International Workshop on Semantic Evaluation. Stroudsburg， PA： ACL， 2017：27-48. 10.18653/v1/s17-2003

[1]	薛凯鹏, 徐涛, 廖春节. 融合自监督和多层交叉注意力的多模态情感分析网络[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2387-2392.
[2]	李晨阳, 张龙, 郑秋生, 钱少华. 基于扩散序列的多元可控文本生成[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2414-2420.
[3]	陆潜慧, 张羽, 王梦灵, 吴庭伟, 单玉忠. 基于改进循环池化网络的核电装备质量文本分类模型[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2034-2040.
[4]	赵征宇, 罗景, 涂新辉. 基于多粒度语义融合的信息检索方法[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1775-1780.
[5]	余杭, 周艳玲, 翟梦鑫, 刘涵. 基于预训练模型与标签融合的文本分类[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 709-714.
[6]	王楷天, 叶青, 程春雷. 基于异构图表示的中医电子病历分类方法[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 411-417.
[7]	胡能兵, 蔡彪, 李旭, 曹旦华. 基于图池化对比学习的图分类方法[J]. 《计算机应用》唯一官方网站, 2024, 44(11): 3327-3334.
[8]	林翔, 金彪, 尤玮婧, 姚志强, 熊金波. 基于脆弱指纹的深度神经网络模型完整性验证框架[J]. 《计算机应用》唯一官方网站, 2024, 44(11): 3479-3486.
[9]	田悦霖, 黄瑞章, 任丽娜. 融合局部语义特征的学者细粒度信息提取方法[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2707-2714.
[10]	张心月, 刘蓉, 魏驰宇, 方可. 融合提示知识的方面级情感分析方法[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2753-2759.
[11]	于碧辉, 蔡兴业, 魏靖烜. 基于提示学习的小样本文本分类方法[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2735-2740.
[12]	张小艳, 段正宇. 基于句级别GAN的跨语言零资源命名实体识别模型[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2406-2411.
[13]	陈侃松, 郑园, 许立君, 王周宇, 张哲, 姚福娟. 基于ThetaMEX全局池化的人脸识别神经网络——ShuffaceNet[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2572-2580.
[14]	徐清海, 丁世飞, 孙统风, 张健, 郭丽丽. 改进的基于多路径特征的胶囊网络[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1330-1335.
[15]	石利锋, 倪郑威. 基于槽位相关信息提取的对话状态追踪模型[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1430-1437.

基于池化和特征组合增强BERT的答案选择模型

Answer selection model based on pooling and feature combination enhanced BERT

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 12

参考文献 19

相关文章 15

编辑推荐

Metrics