Answer selection model based on pooling and feature combination enhanced BERT

doi:10.11772/j.issn.1001-9081.2021122167

Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (2): 365-373.DOI: 10.11772/j.issn.1001-9081.2021122167

Special Issue: 人工智能

• Artificial intelligence • Previous Articles Next Articles

Answer selection model based on pooling and feature combination enhanced BERT

Jie HU¹^,²(), Xiaoxi CHEN¹, Yan ZHANG¹^,²

^1.School of Computer Science and Information Engineering，Hubei University，Wuhan Hubei 430062，China
^2.Hubei Engineering Technology Research Center for Educational Informatization （Hubei University），Wuhan Hubei 430062，China

Received:2021-12-29 Revised:2022-06-04 Accepted:2022-06-10 Online:2022-06-30 Published:2023-02-10
Contact: Jie HU
About author:CHEN Xiaoxi， born in 1997， M. S. candidate. Her research interests include natural language processing.
ZHANG Yan， born in 1974， Ph. D.， professor. His research interests include software engineering， information security.
Supported by:
National Natural Science Foundation of China(61977021)

基于池化和特征组合增强BERT的答案选择模型

胡婕¹^,²(), 陈晓茜¹, 张龑¹^,²

^1.湖北大学计算机与信息工程学院，武汉 430062
^2.湖北省教育信息化工程技术研究中心（湖北大学），武汉 430062

通讯作者: 胡婕
作者简介:陈晓茜（1997—），女，河南平顶山人，硕士研究生，主要研究方向：自然语言处理
张龑（1974—），男，湖北宜昌人，教授，博士，CCF会员，主要研究方向：软件工程、信息安全。
基金资助:
国家自然科学基金资助项目(61977021)

Abstract

Abstract:

Current main stream models cannot fully express the semantics of question and answer pairs， do not fully consider the relationships between the topic information of question and answer pairs， and the activation function has the problem of soft saturation， which affect the overall performance of the model. To solve these problems， an answer selection model based on pooling and feature combination enhanced BERT （Bi-directional Encoder Representations from Transformers） was proposed. Firstly， adversarial samples and pooling operation were introduced to represent the semantics of question and answer pairs based on the pre-training model BERT. Secondly， the relationships between topic information of question and answer pairs were strengthened by the feature combination of topic information. Finally， the activation function in the hidden layer was improved， and the splicing vector was used to complete the answer selection task through the hidden layer and classifier. Model validation was performed on datasets SemEval-2016CQA and SemEval-2017CQA. The results show that compared with tBERT model， the proposed model has the accuracy increased by 3.1 percentage points and 2.2 percentage points respectively， F1 score increased by 2.0 percentage points and 3.1 percentage points respectively. It can be seen that the comprehensive effect of the proposed model on the answer selection task is effectively improved， and both of the accuracy and F1 score of the model are better than those of the model for comparison.

Key words: answer selection, pre-training model, pooling, feature combination, activation function

摘要：

当前主流模型无法充分地表示问答对的语义，未充分考虑问答对主题信息间的联系并且激活函数存在软饱和的问题，而这些会影响模型的整体性能。针对这些问题，提出了一种基于池化和特征组合增强BERT的答案选择模型。首先，在预训练模型BERT的基础上增加对抗样本并引入池化操作来表示问答对的语义；其次，引入主题信息特征组合来加强问答对主题信息间的联系；最后，改进隐藏层的激活函数，并用拼接向量通过隐藏层和分类器完成答案选择任务。在SemEval-2016CQA和SemEval-2017CQA数据集上进行的验证结果表明，所提模型与tBERT模型相比，准确率分别提高了3.1个百分点和2.2个百分点；F1值分别提高了2.0个百分点和3.1个百分点。可见，所提模型在答案选择任务上的综合效果得到了有效提升，准确率和F1值均优于对比模型。

关键词: 答案选择, 预训练模型, 池化, 特征组合, 激活函数

CLC Number:

TP391.1

Jie HU, Xiaoxi CHEN, Yan ZHANG. Answer selection model based on pooling and feature combination enhanced BERT[J]. Journal of Computer Applications, 2023, 43(2): 365-373.

胡婕, 陈晓茜, 张龑. 基于池化和特征组合增强BERT的答案选择模型[J]. 《计算机应用》唯一官方网站, 2023, 43(2): 365-373.

Figures/Tables 12

Fig. 1 Structure of proposed model

Fig. 2 Structure of BERT model

Fig. 3 Structure of LDA model

Tab. 1 Description of datasets

数据集	样本数			文本的平均长度
数据集	训练集	验证集	测试集	文本的平均长度
SemEval-2016CQA	20 340	2 440	3 270	42
SemEval-2017CQA	14 110	2 440	2 930	46
MSRP	3 576	500	1 725	18

Tab. 2 Parameter setting

参数	值	参数	值
learning-rate	$3 E - 5$	batch_size	16
optimization	Adam	numbers of topics	70
epochs	3	LDA alpha values	50
hidden_size	768

Tab. 2 Parameter setting

参数	值	参数	值
learning-rate	$3 E - 5$	batch_size	16
optimization	Adam	numbers of topics	70
epochs	3	LDA alpha values	50
hidden_size	768

Tab. 3 Comparison of accuracy and F1 scores of tBERT，tBERT-AT，tBERT-pooling， and tBERT-AT-pooling models

模型	SemEval-2016CQA		SemEval-2017CQA
模型	准确率	F1	准确率	F1
tBERT	77.6	74.1	78.3	76.8
tBERT-AT	78.6	74.9	79.4	77.9
tBERT-pooling	78.0	74.4	78.6	77.2
tBERT-AT-pooling	78.8	75.3	79.6	78.1

Tab. 4 Comparison of accuracy and F1 scores of tBERT， tBERT-AT，tBERT-pooling and tBERT-AT-pooling models before and after introducing combination of topic information features

模型	SemEval-2016CQA		SemEval-2017CQA
模型	准确率	F1	准确率	F1
tBERT	77.6	74.1	78.3	76.8
tBERT-特征组合	77.9	74.3	78.5	77.0
tBERT-AT	78.6	74.9	79.4	77.9
tBERT-AT-特征组合	78.9	75.1	79.5	78.1
tBERT-pooling	78.0	74.4	78.6	77.2
tBERT-pooling-特征组合	78.4	74.7	78.8	77.5
tBERT-AT-pooling	78.8	75.3	79.6	78.1
tBERT-AT-pooling-特征组合	79.2	75.6	79.9	78.6

Tab. 5 Comparison of accuracy and F1 scores of tBERT， tBERT-AT-feature combination， tBERT-pooling-feature combination and tBERT-AT-pooling-feature combination models before and after improving activation function

模型	SemEval-2016CQA		SemEval-017CQA
模型	准确率	F1	准确率	F1
tBERT-tanh	77.6	74.1	78.3	76.8
tBERT-改进的激活函数	78.5	74.3	79.0	77.3
tBERT-AT-特征组合-tanh	78.9	75.1	79.5	78.1
tBERT-AT-特征组合-改进的激活函数	79.1	75.3	79.7	78.4
tBERT-pooling-特征组合-tanh	78.4	74.7	78.8	77.5
tBERT-pooling-特征组合-改进的激活函数	79.3	75.6	80.1	78.2
tBERT-AT-pooling-特征组合-tanh	79.2	75.6	79.9	78.6
本文模型	80.7	76.1	80.5	79.9

Tab. 6 Comparison of accuracy and F1 scores of tBERT，tBERT before and after improving activation function on MSRP dataset

模型	MSRP
模型	准确率	F1
tBERT-tanh	89.5	88.4
tBERT-改进后的激活函数	89.8	88.6

Tab. 7 Comparison of accuracy and F1 scores of related models

模型	SemEval-2016CQA		SemEval-2017CQA
模型	准确率	F1	准确率	F1
LDA主题模型	70.3	67.6	71.4	68.4
ECNU	74.3	66.7	78.4	77.6
Siamese-BiLSTM	74.6	68.7	75.3	67.1
UIA-LSTM-CNN	78.2	68.4	77.1	76.4
AUANN	80.5	74.5	78.5	79.8
BERT	75.6	71.9	76.2	70.4
GMN-BERT	76.7	72.8	77.5	71.6
BERT-pooling	76.1	72.5	77.1	71.1
tBERT	77.6	74.1	78.3	76.8
本文模型	80.7	76.1	80.5	79.9

Tab.8 Comparison of attention visualization to the same example between tBERT and proposed model

模型	注意力可视化示例
tBERT模型^［15］	问题：How much salary？ Hi everyone I’m in the process of negotiating my salary but I have no idea how much should be the salary of mechanical engineer with grade 5 in a government company and the benefits. This will be my first time in Qatar. Kindly help me. Thanks in advance.
本文模型	问题：How much salary？ Hi everyone I’m in the process of negotiating my salary but I have no idea how much should be the salary of mechanical engineer with grade 5 in a government company and the benefits. This will be my first time in Qatar. Kindly help me. Thanks in advance.

Tab.9 Comparison of answers to the same question predicted by tBERT and proposed model

问题	答案
问题	tBERT模型^［15］	本文模型
How much salary？ Hi everyone I’m in the process of negotiating my salary but I have no idea how much should be the salary of mechanical engineer with grade 5 in a government company and the benefits. This will be my first time in Qatar. Kindly help me. Thanks in advance.	Hey； I am a Mechanical Engineer as well and working in Qatar. You can email me and we can discus it further.	That should be around 12-15 and you should get free government housing and a 3 000 mobile and internet allowance. That’s it.

References 19

1	ASKAR M T R， HUANG J X， HOQUE E. Contextualized embeddings based transformer encoder for sentence similarity modeling in answer selection task［C］// Proceedings of the 12th Language Resources and Evaluation Conference. ［S.l.］： European Language Resources Association， 2020： 5505-5514.
2	YANG L， AI Q Y， GUO J F， et al. aNMM： ranking short answer texts with attention-based neural matching model［C］// Proceedings of the 25th ACM International Conference on Information and Knowledge Management. New York： ACM， 2016： 287-296. 10.1145/2983323.2983818
3	YANG R Q， ZHANG J H， GAO X， et al. Simple and effective text matching with richer alignment features［C］// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg， PA： ACL， 2019：4699-4709. 10.18653/v1/p19-1465
4	NECULOIU P， VERSTEEGH M， ROTARU M. Learning text similarity with Siamese recurrent networks［C］// Proceedings of the 1st Workshop on Representation Learning for NLP. Stroudsburg， PA： ACL， 2016： 148-157. 10.18653/v1/w16-1617
5	BLEI D M， NG A Y， JORDAN M I. Latent Dirichlet allocation［J］. The Journal of Machine Learning Research， 2003， 3： 993-1022.
6	MIHAYLOV T， NAKOV P. SemanticZ at SemEval-2016 Task 3： ranking relevant answers in community question answering using semantic similarity based on fine-tuned word embeddings［C］// Proceedings of the 10th International Workshop on Semantic Evaluation. Stroudsburg， PA： ACL， 2016： 879-886. 10.18653/v1/s16-1136
7	WU G S， SHENG Y X， LAN M， et al. ECNU at SemEval-2017 task 3： using traditional and deep learning methods to address community question answering task［C］// Proceedings of the 11th International Workshop on Semantic Evaluation. Stroudsburg， PA： ACL， 2017： 365-369. 10.18653/v1/s17-2060
8	WEN J H， MA J W， FENG Y L， et al. Hybrid attentive answer selection in CQA with deep users modelling［C］// Proceedings of the 32nd AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2018： 2556-2563. 10.1609/aaai.v32i1.11840
9	XIE Y X， SHEN Y， LI Y L， et al. Attentive user-engaged adversarial neural network for community question answering［C］// Proceedings of the 34th AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2020：9322-9329. 10.1609/aaai.v34i05.6472
10	MIKOLOV T， CHEN K， CORRADO G， et al. Efficient estimation of word representations in vector space［EB/OL］. （2013-09-07）［2021-01-06］.. 10.3126/jiee.v3i1.34327
11	PENNINGTON J， SOCHER R， MANNING C D. GloVe： global vectors for word representation［C］// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg， PA： ACL， 2014： 1532-1543. 10.3115/v1/d14-1162
12	DEVLIN J， CHANG M W， LEE K， et al. BERT： pre-training of deep bidirectional transformers for language understanding［C］// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies， Volume 1 （Long and Short Papers）. Stroudsburg， PA： ACL， 2019：4171-4186. 10.18653/v1/n18-2
13	LASKAR M T R， HOQUE E， HUANG J X. Utilizing bidirectional encoder representations from transformers for answer selection［C］// Proceedings of the 2019 International Conference on Applied Mathematics， Modeling and Computational Science， PROMS 343. Cham： Springer， 2021： 693-703. 10.1007/978-3-030-63591-6_63
14	CHEN L， ZHAO Y B， LV B， et al. Neural graph matching networks for Chinese short text matching［C］// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg， PA： ACL， 2020：6152-6158. 10.18653/v1/2020.acl-main.547
15	PEINELT N， NGUYEN D， LIAKATA M. tBERT： topic models and BERT joining forces for semantic similarity detection［C］// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg， PA： ACL， 2020： 7047-7055. 10.18653/v1/2020.acl-main.630
16	REIMERS N， GUREVYCH I. Sentence-BERT： sentence embeddings using Siamese BERT-networks［EB/OL］. （2019-08-27）［2020-03-24］.. 10.18653/v1/d19-1410
17	栾克鑫，杜新凯，孙承杰，等. 基于注意力机制的句子排序方法［J］. 中文信息学报， 2018， 32（1）：123-130. 10.3969/j.issn.1003-0077.2018.01.016
	LUAN K X， DU X K， SUN C J， et al. Sentence ordering based on attention mechanism［J］. Journal of Chinese Information Processing， 2018， 32（1）：123-130. 10.3969/j.issn.1003-0077.2018.01.016
18	NAKOV P， MÀRQUEZ L， MOSHITTI A， et al. SemEval-2016 Task 3： community question answering［C］// Proceedings of the 10th International Workshop on Semantic Evaluation. Stroudsburg， PA： ACL， 2016：525-545. 10.18653/v1/s16-1083
19	NAKOV P， HOOGEVEEN D， MÀRQUEZ L， et al. SemEval-2017 Task 3： community question answering［C］// Proceedings of the 11th International Workshop on Semantic Evaluation. Stroudsburg， PA： ACL， 2017：27-48. 10.18653/v1/s17-2003

[1]	Jinjin LI, Guoming SANG, Yijia ZHANG. Multi-domain fake news detection model enhanced by APK-CNN and Transformer [J]. Journal of Computer Applications, 2024, 44(9): 2674-2682.
[2]	Qianhui LU, Yu ZHANG, Mengling WANG, Tingwei WU, Yuzhong SHAN. Classification model of nuclear power equipment quality text based on improved recurrent pooling network [J]. Journal of Computer Applications, 2024, 44(7): 2034-2040.
[3]	Hang YU, Yanling ZHOU, Mengxin ZHAI, Han LIU. Text classification based on pre-training model and label fusion [J]. Journal of Computer Applications, 2024, 44(3): 709-714.
[4]	Kaitian WANG, Qing YE, Chunlei CHENG. Classification method for traditional Chinese medicine electronic medical records based on heterogeneous graph representation [J]. Journal of Computer Applications, 2024, 44(2): 411-417.
[5]	Nengbing HU, Biao CAI, Xu LI, Danhua CAO. Graph classification method based on graph pooling contrast learning [J]. Journal of Computer Applications, 2024, 44(11): 3327-3334.
[6]	Kansong CHEN, Yuan ZHENG, Lijun XU, Zhouyu WANG, Zhe ZHANG, Fujuan YAO. ShuffaceNet： face recognition neural network based on ThetaMEX global pooling [J]. Journal of Computer Applications, 2023, 43(8): 2572-2580.
[7]	Qinghai XU, Shifei DING, Tongfeng SUN, Jian ZHANG, Lili GUO. Improved capsule network based on multipath feature [J]. Journal of Computer Applications, 2023, 43(5): 1330-1335.
[8]	Huiru WANG, Xiuhong LI, Zhe LI, Chunming MA, Zeyu REN, Dan YANG. Survey of multimodal pre-training models [J]. Journal of Computer Applications, 2023, 43(4): 991-1004.
[9]	Zeyu WANG, Shuhui BU, Wei HUANG, Yuanpan ZHENG, Qinggang WU, Xu ZHANG. Local and global context attentive fusion network for traffic scene parsing [J]. Journal of Computer Applications, 2023, 43(3): 713-722.
[10]	Yingmao YAO, Xiaoyan JIANG. Video-based person re-identification method based on graph convolution network and self-attention graph pooling [J]. Journal of Computer Applications, 2023, 43(3): 728-735.
[11]	Xiaofei JI, Kexin ZHANG, Lirong TANG. Book spine segmentation algorithm based on improved DeepLabv3+ network [J]. Journal of Computer Applications, 2023, 43(12): 3927-3932.
[12]	Zixing YU, Shaojun QU, Xin HE, Zhuo WANG. High-low dimensional feature guided real-time semantic segmentation network [J]. Journal of Computer Applications, 2023, 43(10): 3077-3085.
[13]	Wen HAO, Yang WANG, Hainan WEI. Semantic segmentation of point cloud scenes based on multi-feature fusion [J]. Journal of Computer Applications, 2023, 43(10): 3202-3208.
[14]	Jun MA, Zhen YAO, Cuifeng XU, Shouhong CHEN. Multi-UAV real-time tracking algorithm based on improved PP-YOLO and Deep-SORT [J]. Journal of Computer Applications, 2022, 42(9): 2885-2892.
[15]	Jie HU, Yan HU, Mengchi LIU, Yan ZHANG. Chinese named entity recognition based on knowledge base entity enhanced BERT model [J]. Journal of Computer Applications, 2022, 42(9): 2680-2685.

Answer selection model based on pooling and feature combination enhanced BERT

基于池化和特征组合增强BERT的答案选择模型

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 12

References 19

Related Articles 15

Recommended Articles

Metrics