Answer selection model based on dynamic attention and multi-perspective matching

doi:10.11772/j.issn.1001-9081.2021010027

Journal of Computer Applications ›› 2021, Vol. 41 ›› Issue (11): 3156-3163.DOI: 10.11772/j.issn.1001-9081.2021010027

Special Issue: 人工智能

• Artificial intelligence • Previous Articles Next Articles

Answer selection model based on dynamic attention and multi-perspective matching

Zhichao LI, Tohti TURDI(), Hamdulla ASKAR

College of Information Science and Engineering，Xinjiang University，Urumqi Xinjiang 830046，China

Received:2021-01-11 Revised:2021-05-24 Accepted:2021-05-25 Online:2021-11-29 Published:2021-11-10
Contact: Tohti TURDI
About author:LI Zhichao，born in 1993，M. S. candidate. His research interests include question answering system，natural language processing
TURDI Tohti，born in 1975，Ph. D.，associate professor. His research interests include natural language processing，text mining，machine learning.
ASKAR Hamdulla，born in 1972，Ph. D.，professor. His researchinterests include intelligent information processing，machine learning.
Supported by:
the Natural Science Foundation of Xinjiang Uygur Autonomous Region(2021D01C076)

基于动态注意力和多角度匹配的答案选择模型

李志超, 吐尔地·托合提(), 艾斯卡尔·艾木都拉

新疆大学信息科学与工程学院，乌鲁木齐 830046

通讯作者: 吐尔地·托合提
作者简介:李志超（1993—），男，湖南涟源人，硕士研究生，主要研究方向：问答系统、自然语言处理
吐尔地·托合提（1975—），男，新疆乌鲁木齐人，副教授，博士，CCF高级会员，主要研究方向：自然语言处理、文本挖掘、机器学习
艾斯卡尔·艾木都拉（1972—），男，新疆乌鲁木齐人，教授，博士生导师，博士，CCF高级会员，主要研究方向：智能信息处理、机器学习。
基金资助:
新疆维吾尔自治区自然科学基金资助项目(2021D01C076)

Abstract

Abstract:

The current mainstream neural networks cannot satisfy the full expression of sentences and the full information interaction between sentences at the same time when processing answer selection tasks. In order to solve the problems， an answer selection model based on Dynamic Attention and Multi-Perspective Matching （DAMPM） was proposed. Firstly， the pre-trained Embeddings from Language Models （ELMo） was introduced to obtain the word vectors containing simple semantic information. Secondly， the filtering mechanism was used in the attention layer to remove the noise in the sentences effectively， so that the sentence representation of question and answer sentences was obtained in a better way. Thirdly， the multiple matching strategies were introduced in the matching layer at the same time to complete the information interaction between sentence vectors. Then， the sentence vectors output from the matching layer were spliced by the Bidirectional Long Short-Term Memory （BiLSTM） network. Finally， the similarity of splicing vectors was calculated by a classifier， and the semantic correlation between question and answer sentences was acquired. The experimental results on the Text REtrieval Conference Question Answering （TRECQA） dataset show that， compared with the Dynamic-Clip Attention Network （DCAN） method， which is one of the comparison aggregation framework based baseline models， the proposed DAMPM improves the Mean Average Precision （MAP） and Mean Reciprocal Rank （MRR） both by 1.6 percentage points. The experimental results on the Wiki Question Answering （WikiQA） dataset show that， the two performance indices of DAMPM is 0.7 percentage points and 0.8 percentage points higher than those of DCAN respectively. The proposed DAMPM has better performance than the methods in the baseline models in general.

Key words: neural network, answer selection, dynamic attention mechanism, multi-perspective matching, pre-trained language model

摘要：

针对当前主流神经网络在处理答案选择任务时无法同时满足句子的充分表示以及句子间信息充分交互的问题，提出了基于动态注意力和多角度匹配（DAMPM）的答案选择模型。首先，调用预训练语言模型的嵌入（ELMo）获得包含简单语义信息的词向量；接着，在注意力层采用过滤机制有效地去除句子中的噪声，从而更好地得到问句和答案句的句子表征；其次，在匹配层同时引入多种匹配策略来完成句子向量之间的信息交互；然后，利用双向长短期记忆（BiLSTM）网络对匹配层输出的句子向量进行拼接；最后，通过分类器来计算拼接向量的相似度大小，从而得到问句和答案句之间的语义关联。在文本检索会议问答（TRECQA）数据集上的实验结果表明，与基于比较聚合框架的基线模型中的动态滑动注意力网络（DCAN）方法相比，DAMPM在平均准确率均值（MAP）和平均倒数排名（MRR）两个性能指标上均提高了1.6个百分点。在维基百科问答（WikiQA）数据集上的实验结果表明，DAMPM相较DCAN在两个性能指标上分别提高了0.7个百分点和0.8个百分点。所提DAMPM相较于基线模型中的方法整体上有更好的性能表现。

关键词: 神经网络, 答案选择, 动态注意力机制, 多角度匹配, 预训练语言模型

CLC Number:

TP391.1

Zhichao LI, Tohti TURDI, Hamdulla ASKAR. Answer selection model based on dynamic attention and multi-perspective matching[J]. Journal of Computer Applications, 2021, 41(11): 3156-3163.

李志超, 吐尔地·托合提, 艾斯卡尔·艾木都拉. 基于动态注意力和多角度匹配的答案选择模型[J]. 《计算机应用》唯一官方网站, 2021, 41(11): 3156-3163.

Figures/Tables 9

References 26

1	TAN M， SANTOS C DOS， XIANG B， et al. LSTM-based deep learning models for non-factoid answer selection ［EB/OL］. （2016-03-28）［2019-01-10］. . 10.18653/v1/p16-1044
2	HE H， GIMPEL K， LIN J. Multi-perspective sentence similarity modeling with convolutional neural networks ［C］// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Stroudsburg： ACL， 2015： 1576-1586. 10.18653/v1/d15-1181
3	GARG S， VU T， MOSCHITTII A. TANDA： transfer and adapt pre-trained transformer models for answer sentence selection ［EB/OL］. ［2020-05-01］. . 10.1609/aaai.v34i05.6282
4	孙源，王健，张益嘉，等.融合粗细粒度信息的长答案选择神经网络模型［J］.中文信息学报，2021，35（4）：100-109. 10.3969/j.issn.1003-0077.2021.04.014
	SUN Y， WANG J， ZHANG Y J， et al. Long answer selection neural model integrating coarse and fine granularity information ［J］. Journal of Chinese Information Processing， 2021， 35（4）： 100-109. 10.3969/j.issn.1003-0077.2021.04.014
5	冯文政，唐杰.融合深度匹配特征的答案选择模型［J］.中文信息学报，2019，33（1）：118-124. 10.3969/j.issn.1003-0077.2019.01.014
	FENG W Z， TANG J. Answer selection model integrating depth matching features ［J］. Journal of Chinese Information Processing， 2019， 33（1）： 118-124. 10.3969/j.issn.1003-0077.2019.01.014
6	PETERS M E， NEUMANN M， IYYER M， et al. Deep contextualized word representations ［C］// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies， Volume1（Long Papers）. Stroudsburg： ACL， 2018： 2227-2237.
7	KENTER T， BORISOV A， DE RIJKE M. Siamese CBOW： optimizing word embeddings for sentence representations ［C］// Proceedings of the 2016 54th Annual Meeting of the Association for Computational Linguistics （Volume 1： Long Papers）. Stroudsburg： ACL， 2016：941-951. 10.18653/v1/p16-1089
8	MUELLER J， THYAGARAJAN A. Siamese recurrent architectures for learning sentence similarity ［C］// Proceedings of the 2016 30th AAAI Conference on Artificial Intelligence. Palo Alto： AAAI Press， 2016： 2786-2792. 10.1609/aaai.v34i10.7136
9	NECULOIU P， VERSTEEGH M， ROTARU M. Learning text similarity with Siamese recurrent networks ［C］// Proceedings of the 1st Workshop on Representation Learning for NLP. Stroudsburg： ACL， 2016： 148-157. 10.18653/v1/w16-1617
10	BIAN W J， LI S， YANG Z， et al. A compare-aggregate model with dynamic-clip attention for answer selection ［C］// Proceedings of the 2017 ACM Conference on Information and Knowledge Management. New York： ACM， 2017： 1987-1990. 10.1145/3132847.3133089
11	SHA L， ZHNAG X D， QIAN F， et al. A multi-view fusion neural network for answer selection ［C］// Proceedings of the 2018 32nd AAAI Conference on Artificial Intelligence. Palo Alto： AAAI Press， 2018： 5422-5429.
12	YOON S， DERNONCOURT F， KIM D S， et al. A compare-aggregate model with latent clustering for answer selection ［C］// Proceedings of the 2019 28th ACM International Conference on Information and Knowledge Management. New York： ACM， 2019： 2093-2096. 10.1145/3357384.3358148
13	WANG S H， JIANG J. A compare-aggregate model for matching text sequences ［EB/OL］. （2016-11-06）［2019-05-05］. . 10.1109/ijcnn.2019.8852062
14	TAN M， SANTOS C DOS， XIANG B， et al. Improved representation learning for question answer matching ［C］// Proceedings of the 2016 54th Annual Meeting of the Association for Computational Linguistics （Volume 1： Long Papers）. Stroudsburg： ACL， 2016：464-473. 10.18653/v1/p16-1044
15	SANTOS C DOS， TAN M， XIANG B， et al. Attentive pooling networks ［EB/OL］. （2016-02-11）［2019-07-05］. .
16	LASKAR M T R， HUANG J， HOQUE E. Contextualized embeddings based transformer encoder for sentence similarity modeling in answer selection task ［C］// Proceedings of the 2020 12th Language Resources and Evaluation Conference. Paris： European Language Resources Association， 2020： 5505-5514.
17	PENNINGTON J， SOCHER R， MANNING C D. GloVe： global vectors for word representation ［C］// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg： ACL， 2014： 1532-1543. 10.3115/v1/d14-1162
18	WANG M Q， SMITH N A， MITAMURA T. What is the Jeopardy model？ a quasi-synchronous grammar for QA ［C］// Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Stroudsburg： ACL， 2007： 22-32.
19	YANG Y， YIH W T， MEEK C. WikiQA： a challenge dataset for open-domain question answering ［C］// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Stroudsburg： ACL， 2015： 2013-2018. 10.18653/v1/d15-1237
20	RAO J F， HE H， LIN J. Noise-contrastive estimation for answer selection with deep neural networks ［C］// Proceedings of the 2016 25th ACM International on Conference on Information and Knowledge Management. New York： ACM， 2016： 1913-1916. 10.1145/2983323.2983872
21	TAY Y， TUAN L A， HUI S C. Multi-cast attention networks for retrieval-based question answering and response prediction ［C］// Proceedings of the 2018 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York： ACM， 2018： 2299-2308. 10.1145/3219819.3220048
22	SEVERYN A， MOSCHITTI A. Modeling relational information in question-answer pairs with convolutional neural networks ［EB/OL］. （2016-04-05）［2019-08-12］. . 10.1145/2766462.2767738
23	HE H， LIN J. Pairwise word interaction modeling with deep neural networks for semantic similarity measurement ［C］// Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies. Stroudsburg： ACL， 2016： 937-948. 10.18653/v1/n16-1108
24	JIN Z X， ZHANG B W， ZHOU F， et al. Ranking via partial ordering for answer selection ［J］. Information Sciences， 2020， 538： 358-371. 10.1016/j.ins.2020.05.110
25	DEVLIN J， CHANG M W， LEE K， et al. BERT： pre-training of deep bidirectional transformers for language understanding ［C］// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies， Volume1（Long and Short Papers）. Stroudsburg： ACL， 2016： 4171-4186.
26	HOWARD J， RUDER S. Universal language model fine-tuning for text classification ［C］// Proceedings of the 2018 56th Annual Meeting of the Association for Computational Linguistics （Volume 1： Long Papers）. Stroudsburg： ACL， 2018： 328-339. 10.18653/v1/p18-1031

数据集	数据集类别	问题数量	问题-答案数量	正样本占比/%
TRECQA	Train	1 229	53 417	12.0
	Validation	65	1 117	18.4
	Test	68	1 442	17.2
WikiQA	Train	873	8 672	12.0
	Validation	126	1 130	12.4
	Test	243	2 351	12.5

数据集	数据集类别	问题数量	问题-答案数量	正样本占比/%
TRECQA	Train	1 229	53 417	12.0
	Validation	65	1 117	18.4
	Test	68	1 442	17.2
WikiQA	Train	873	8 672	12.0
	Validation	126	1 130	12.4
	Test	243	2 351	12.5

方法	MAP	MRR
文献［1］方法	72.8	83.2
文献［2］方法	77.7	83.6
文献［10］方法	82.1	89.9
文献［13］方法	80.2	87.5
文献［16］方法	75.3	85.1
文献［20］方法	80.1	87.7
文献［21］方法	83.8	90.4
DAMPM（with K-Max）	82.4	90.8
DAMPM（with K-Threshold）	83.7	91.5

方法	MAP	MRR
文献［1］方法	72.8	83.2
文献［2］方法	77.7	83.6
文献［10］方法	82.1	89.9
文献［13］方法	80.2	87.5
文献［16］方法	75.3	85.1
文献［20］方法	80.1	87.7
文献［21］方法	83.8	90.4
DAMPM（with K-Max）	82.4	90.8
DAMPM（with K-Threshold）	83.7	91.5

方法	MAP	MRR
文献［10］方法	75.4	76.4
文献［13］方法	74.3	75.5
文献［16］方法	68.7	69.6
文献［19］方法	65.2	66.5
文献［22］方法	69.5	71.1
文献［23］方法	70.9	72.3
文献［24］方法	74.6	79.2
DAMPM（with K-Max）	76.1	77.2
DAMPM（with K-Threshold）	75.9	76.7

Answer selection model based on dynamic attention and multi-perspective matching

基于动态注意力和多角度匹配的答案选择模型

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 9

References 26

Related Articles 15

Recommended Articles

Metrics

模型结构	平均倒数排名	下降百分比
w/o Full-Matching	87.2	4.700
w/o Attentive-Matching	87.4	4.480
w/o Max-Attentive Matching	88.5	3.279
w/o ELMo	89.2	2.514
w/o GloVe	91.2	0.328
DAMPM （with K-Threshold）	91.5	0.000

[1]	Xingyao YANG, Yu CHEN, Jiong YU, Zulian ZHANG, Jiaying CHEN, Dongxiao WANG. Recommendation model combining self-features and contrastive learning [J]. Journal of Computer Applications, 2024, 44(9): 2704-2710.
[2]	Xianglan WU, Yang XIAO, Mengying LIU, Mingming LIU. Text-to-SQL model based on semantic enhanced schema linking [J]. Journal of Computer Applications, 2024, 44(9): 2689-2695.
[3]	Guanglei YAO, Juxia XIONG, Guowu YANG. Flower pollination algorithm based on neural network optimization [J]. Journal of Computer Applications, 2024, 44(9): 2829-2837.
[4]	Ying HUANG, Jiayu YANG, Jiahao JIN, Bangrui WAN. Siamese mixed information fusion algorithm for RGBT tracking [J]. Journal of Computer Applications, 2024, 44(9): 2878-2885.
[5]	Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974.
[6]	Hang YANG, Wanggen LI, Gensheng ZHANG, Zhige WANG, Xin KAI. Multi-layer information interactive fusion algorithm based on graph neural network for session-based recommendation [J]. Journal of Computer Applications, 2024, 44(9): 2719-2725.
[7]	Na WANG, Lin JIANG, Yuancheng LI, Yun ZHU. Optimization of tensor virtual machine operator fusion based on graph rewriting and fusion exploration [J]. Journal of Computer Applications, 2024, 44(9): 2802-2809.
[8]	Yun LI, Fuyou WANG, Peiguang JING, Su WANG, Ao XIAO. Uncertainty-based frame associated short video event detection method [J]. Journal of Computer Applications, 2024, 44(9): 2903-2910.
[9]	Tingjie TANG, Jiajin HUANG, Jin QIN. Session-based recommendation with graph auxiliary learning [J]. Journal of Computer Applications, 2024, 44(9): 2711-2718.
[10]	Rui ZHANG, Pengyun ZHANG, Meirong GAO. Self-optimized dual-modal multi-channel non-deep vestibular schwannoma recognition model [J]. Journal of Computer Applications, 2024, 44(9): 2975-2982.
[11]	Jinjin LI, Guoming SANG, Yijia ZHANG. Multi-domain fake news detection model enhanced by APK-CNN and Transformer [J]. Journal of Computer Applications, 2024, 44(9): 2674-2682.
[12]	Yu DU, Yan ZHU. Constructing pre-trained dynamic graph neural network to predict disappearance of academic cooperation behavior [J]. Journal of Computer Applications, 2024, 44(9): 2726-2731.
[13]	Yubo ZHAO, Liping ZHANG, Sheng YAN, Min HOU, Mao GAO. Relation extraction between discipline knowledge entities based on improved piecewise convolutional neural network and knowledge distillation [J]. Journal of Computer Applications, 2024, 44(8): 2421-2429.
[14]	Hong CHEN, Bing QI, Haibo JIN, Cong WU, Li’ang ZHANG. Class-imbalanced traffic abnormal detection based on 1D-CNN and BiGRU [J]. Journal of Computer Applications, 2024, 44(8): 2493-2499.
[15]	Ying YANG, Xiaoyan HAO, Dan YU, Yao MA, Yongle CHEN. Graph data generation approach for graph neural network model extraction attacks [J]. Journal of Computer Applications, 2024, 44(8): 2483-2492.