基于多特征语义匹配的知识库问答系统

doi:10.11772/j.issn.1001-9081.2019111895

计算机应用 ›› 2020, Vol. 40 ›› Issue (7): 1873-1878.DOI: 10.11772/j.issn.1001-9081.2019111895

基于多特征语义匹配的知识库问答系统

赵小虎^1,2, 赵成龙^1,2

1. 矿山互联网应用技术国家地方联合工程实验室(中国矿业大学), 江苏徐州 221008;
2. 中国矿业大学信息与控制工程学院, 江苏徐州 221116

收稿日期:2019-11-07 修回日期:2020-04-19 发布日期:2020-05-19 出版日期:2020-07-10
通讯作者: 赵成龙
作者简介:赵小虎(1976-),男,江苏徐州人,教授,博士,主要研究方向:矿山物联网、矿山通信监视和控制、计算机网络、智能计算;赵成龙(1993-),男,江苏盐城人,硕士研究生,主要研究方向:机器学习、自然语言处理。
基金资助:
国家重点研发计划项目（2017YFC0804400）。

Knowledge base question answering system based on multi-feature semantic matching

ZHAO Xiaohu^1,2, ZHAO Chenglong^1,2

1. National and Local Joint Engineering Laboratory of Internet Application Technology on Mine(China University of Mining and Technology), Xuzhou Jiangsu 221008, China;
2. School of Information and Control Engineering, China University of Mining and Technology, Xuzhou Jiangsu 221116, China

Received:2019-11-07 Revised:2020-04-19 Online:2020-05-19 Published:2020-07-10
Supported by:
This work is partially supported by the National Key Research and Development Program of China (2017YFC0804400).

摘要/Abstract

摘要： 知识库问答（KBQA）任务主要目的在于精确地将自然语言问题和知识库（KB）中的三元组进行匹配。传统的KBQA方法通常专注于实体识别和谓语匹配，实体识别的错误会导致错误传播从而无法得到正确的答案。针对上述问题提出一种端到端的解决方案直接匹配问题和三元组，该系统主要包含候选三元组生成和候选三元组排序两个部分来实现精确问答。首先通过BM25算法计算问题和知识库中三元组的相关性生成候选三元组；然后通过多特征语义匹配模型（MFSMM）进行三元组的排序，即用MFSMM分别通过双向长短时记忆网络（Bi-LSTM）和卷积神经网络（CNN）实现语义相似度和字符相似度的计算，并通过融合来对三元组进行排序。该系统在NLPCC-ICCPOL 2016 KBQA数据集上的平均F1为80.35%，接近了现有最好的表现。

关键词: 知识库, 自然语言问题, 三元组, 多特征语义匹配模型, 语义相似度, 字符相似度

Abstract: The task of Question Answering over Knowledge Base (KBQA) mainly aims at accurately matching natural language question with triples in the Knowledge Base (KB). However, traditional KBQA methods usually focus on entity recognition and predicate matching, and the errors in entity recognition may lead to error propagation and thus fail to get the right answer. To solve the above problem, an end-to-end solution was proposed to directly match the question and triples. This system consists of two parts:candidate triples generation and candidate triples ranking. Firstly, the candidate triples were generated by the BM25 algorithm calculating the correlation between the question and the triples in the knowledge base. Then, Multi-Feature Semantic Matching Model (MFSMM) was used to realize the ranking of the triples, which means the semantic similarity and character similarity were calculated by MFSMM through Bi-directional Long Short Term Memory Network (Bi-LSTM) and Convolutional Neural Network (CNN) respectively, and the triples were ranked by fusion. With NLPCC-ICCPOL 2016 KBQA as the dataset, the average F1 of the proposed system is 80.35%, which is close to the existing best performance.

Key words: Knowledge Base (KB), natural language question, triple, Multi-Feature Semantics Matching Model (MFSMM), semantic similarity, character similarity

中图分类号:

TP391

赵小虎, 赵成龙. 基于多特征语义匹配的知识库问答系统[J]. 计算机应用, 2020, 40(7): 1873-1878.

ZHAO Xiaohu, ZHAO Chenglong. Knowledge base question answering system based on multi-feature semantic matching[J]. Journal of Computer Applications, 2020, 40(7): 1873-1878.

参考文献

[1] ROBERTSON S, ZARAGOZA H. The probabilistic relevance framework:BM25 and beyond[J]. Foundations and Trends in Information Retrieval,2009,3(4):333-389.
[2] BERANT J,CHOU A,FROSTIG R,et al. Semantic parsing on freebase from question-answer pairs[C]//Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Stroudsburg,PA:Association for Computational Linguistics,2013:1533-1544.
[3] YIH W T,CHANG M W,HE X,et al. Semantic parsing via staged query graph generation:question answering with knowledge base[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing. Stroudsburg,PA:Association for Computational Linguistics,2015:1321-1331.
[4] ZETTLEMOYER L S,COLLINS M. Learning to map sentences to logical form:structured classification with probabilistic categorical grammars[C]//Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence. Arlington, VA:AUAI Press, 2005:658-666.
[5] WONG Y W,MOONEY R. Learning synchronous grammars for semantic parsing with lambda calculus[C]//Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. Stroudsburg,PA:Association for Computational Linguistics,2007:960-967.
[6] 翟社平, 段宏宇, 李兆兆. 基于BILSTM_CRF的知识图谱实体抽取方法[J]. 计算机应用与软件,2019,36(5):269-274,280. (ZHAI S P,DUAN H Y,LI Z Z. Knowledge graph entity extraction based on BILSTM_CRF[J]. Computer Applications and Software, 2019,36(5):269-274,280.)
[7] BORDES A,USUNIER N,CHOPRA S,et al. Large-scale simple question answering with memory networks[EB/OL].[2019-05-23]. https://arxiv.org/pdf/1506.02075.pdf.
[8] YIN W,YU M,XIANG B,et al. Simple question answering by attentive convolutional neural network[C]//Proceedings of the 26th International Conference on Computational Linguistics:Technical Papers. Stroudsburg,PA:Association for Computational Linguistics,2016:1746-1756.
[9] DAI Z,LI L,XU W. CFO:conditional focused neural question answering with large-scale knowledge bases[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Stroudsburg,PA:Association for Computational Linguistics, 2016:800-810.
[10] WANG L,ZHANG Y,LIU T. A deep learning approach for question answering over knowledge base[C]//Proceedings of the 24th International Conference on Computer Processing of Oriental Languages and the 5th CCF Conference on Natural Language Processing and Chinese Computing, LNCS 10102. Cham:Springer, 2016:885-892.
[11] YANG F,GAN L,LI A,et al. Combining deep learning with information retrieval for question answering[C]//Proceedings of the 24th International Conference on Computer Processing of Oriental Languages and the 5th CCF Conference on Natural Language Processing and Chinese Computing,LNCS 10102. Cham:Springer, 2016:917-925.
[12] LAI Y,LIN Y,CHEN J,et al. Open domain question answering system based on knowledge base[C]//Proceedings of the 24th International Conference on Computer Processing of Oriental Languages and the 5th CCF Conference on Natural Language Processing and Chinese Computing, LNCS 10102. Cham:Springer, 2016:722-733.
[13] GUPTA V,CHINNAKOTLA M,SHRIVASTAVA M. Retrieve and re-rank:a simple and effective IR approach to simple question answering over knowledge graphs[C]//Proceedings of the 1st Workshop on Fact Extraction and VERification. Stroudsburg,PA:Association for Computational Linguistics,2018:22-27.
[14] DAHL G E,SAINATH T N,HINTON G E. Improving deep neural networks for LVCSR using rectified linear units and dropout[C]//Proceedings of the 2013 IEEE International Conference on Acoustics,Speech and Signal Processing. Piscataway:IEEE, 2013:8609-8613.
[15] 韩萍, 孙佳慧, 方澄, 等. 基于情感融合和多维注意力机制的微博文本情感分析[J]. 计算机应用,2019,39(S1):75-78.(HAN P,SUN J H,FANG C,et al. Micro-blog sentiment analysis based on emotional fusion and multi-dimensional self-attention mechanism[J]. Journal of Computer Applications,2019,39(S1):75-78.)
[16] TAN M,DOS SANTOS C,XIANG B,et al. Improved representation learning for question answer matching[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Stroudsburg,PA:Association for Computational Linguistics, 2016:464-473.
[17] LIU P,QIU X,CHEN J,et al. Deep fusion LSTMs for text semantic matching[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Stroudsburg,PA:Association for Computational Linguistics,2016:1034-1043.
[18] CHEN Q,ZHU X,LING Z,et al. Enhanced LSTM for natural language inference[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA:Association for Computational Linguistics,2017:1657-1668.
[19] SEVERYN A,MOSCHITTI A. Learning to rank short text pairs with convolutional deep neural networks[C]//Proceedings of the 38th International ACM SIGIR Conference on Information Retrieval. New York:ACM,2015:373-382.
[20] YIN W,SCHÜTZE H,XIANG B,et al. ABCNN:attention-based convolutional neural network for modeling sentence pairs[J]. Transactions of the Association for Computational Linguistics, 2016,4:259-272.
[21] XIE Z,ZENG Z,ZHOU G,et al. Knowledge base question answering based on deep learning models[C]//Proceedings of the 24th International Conference on Computer Processing of Oriental Languages and the 5th CCF Conference on Natural Language Processing and Chinese Computing,LNCS 10102. Cham:Springer, 2016:300-311.

基于多特征语义匹配的知识库问答系统

Knowledge base question answering system based on multi-feature semantic matching

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	游新冬, 问英姿, 佘鑫鹏, 吕学强. 面向煤矿机电设备领域的三元组抽取方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2026-2033.
[2]	高龙涛, 李娜娜. 基于方面感知注意力增强的方面情感三元组抽取[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1049-1057.
[3]	吴祖成, 吴小俊, 徐天阳. 基于模态内细粒度特征关系提取的图像文本检索模型[J]. 《计算机应用》唯一官方网站, 2024, 44(12): 3776-3783.
[4]	王超, 姚姗姗. 基于语音质量自适应和类三元组思想的说话人确认方法[J]. 《计算机应用》唯一官方网站, 2024, 44(12): 3899-3906.
[5]	李言博, 何庆, 陆顺意. 融合语义和句法信息的方面情感三元组抽取[J]. 《计算机应用》唯一官方网站, 2024, 44(10): 3275-3280.
[6]	王炫力, 靳小龙, 侯中妮, 廖华明, 张瑾. 基于森林的实体关系联合抽取模型[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2700-2706.
[7]	拓雨欣, 薛涛. 融合指针网络与关系嵌入的三元组联合抽取模型[J]. 《计算机应用》唯一官方网站, 2023, 43(7): 2116-2124.
[8]	郭玉彬, 文向, 刘攀, 李西明. 基于双流结构的跨模态行人重识别关系网络[J]. 《计算机应用》唯一官方网站, 2023, 43(6): 1803-1810.
[9]	陈林颖, 刘建华, 孙水华, 郑智雄, 林鸿辉, 林杰. 面向方面的自适应跨度特征的细粒度意见元组提取[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1454-1460.
[10]	胡婕, 胡燕, 刘梦赤, 张龑. 基于知识库实体增强BERT模型的中文命名实体识别[J]. 《计算机应用》唯一官方网站, 2022, 42(9): 2680-2685.
[11]	李大伟, 曾智勇. 基于动态双注意力机制的跨模态行人重识别模型[J]. 《计算机应用》唯一官方网站, 2022, 42(10): 3200-3208.
[12]	王月, 江逸茗, 兰巨龙. 基于改进三元组网络和K近邻算法的入侵检测[J]. 计算机应用, 2021, 41(7): 1996-2002.
[13]	李子龙, 周勇, 鲍蓉, 王洪栋. 优化三元组损失的深度距离度量学习方法[J]. 《计算机应用》唯一官方网站, 2021, 41(12): 3480-3484.
[14]	王元龙. 面向阅读理解的句子组合模型[J]. 计算机应用, 2017, 37(6): 1741-1746.
[15]	张硕望, 欧阳纯萍, 阳小华, 刘永彬, 刘志明. 融合《知网》和搜索引擎的词汇语义相似度计算[J]. 计算机应用, 2017, 37(4): 1056-1060.