计算机应用 ›› 2020, Vol. 40 ›› Issue (7): 1873-1878.DOI: 10.11772/j.issn.1001-9081.2019111895

• 人工智能 • 上一篇    下一篇

基于多特征语义匹配的知识库问答系统

赵小虎1,2, 赵成龙1,2   

  1. 1. 矿山互联网应用技术国家地方联合工程实验室(中国矿业大学), 江苏 徐州 221008;
    2. 中国矿业大学 信息与控制工程学院, 江苏 徐州 221116
  • 收稿日期:2019-11-07 修回日期:2020-04-19 出版日期:2020-07-10 发布日期:2020-05-19
  • 通讯作者: 赵成龙
  • 作者简介:赵小虎(1976-),男,江苏徐州人,教授,博士,主要研究方向:矿山物联网、矿山通信监视和控制、计算机网络、智能计算;赵成龙(1993-),男,江苏盐城人,硕士研究生,主要研究方向:机器学习、自然语言处理。
  • 基金资助:
    国家重点研发计划项目(2017YFC0804400)。

Knowledge base question answering system based on multi-feature semantic matching

ZHAO Xiaohu1,2, ZHAO Chenglong1,2   

  1. 1. National and Local Joint Engineering Laboratory of Internet Application Technology on Mine(China University of Mining and Technology), Xuzhou Jiangsu 221008, China;
    2. School of Information and Control Engineering, China University of Mining and Technology, Xuzhou Jiangsu 221116, China
  • Received:2019-11-07 Revised:2020-04-19 Online:2020-07-10 Published:2020-05-19
  • Supported by:
    This work is partially supported by the National Key Research and Development Program of China (2017YFC0804400).

摘要: 知识库问答(KBQA)任务主要目的在于精确地将自然语言问题和知识库(KB)中的三元组进行匹配。传统的KBQA方法通常专注于实体识别和谓语匹配,实体识别的错误会导致错误传播从而无法得到正确的答案。针对上述问题提出一种端到端的解决方案直接匹配问题和三元组,该系统主要包含候选三元组生成和候选三元组排序两个部分来实现精确问答。首先通过BM25算法计算问题和知识库中三元组的相关性生成候选三元组;然后通过多特征语义匹配模型(MFSMM)进行三元组的排序,即用MFSMM分别通过双向长短时记忆网络(Bi-LSTM)和卷积神经网络(CNN)实现语义相似度和字符相似度的计算,并通过融合来对三元组进行排序。该系统在NLPCC-ICCPOL 2016 KBQA数据集上的平均F1为80.35%,接近了现有最好的表现。

关键词: 知识库, 自然语言问题, 三元组, 多特征语义匹配模型, 语义相似度, 字符相似度

Abstract: The task of Question Answering over Knowledge Base (KBQA) mainly aims at accurately matching natural language question with triples in the Knowledge Base (KB). However, traditional KBQA methods usually focus on entity recognition and predicate matching, and the errors in entity recognition may lead to error propagation and thus fail to get the right answer. To solve the above problem, an end-to-end solution was proposed to directly match the question and triples. This system consists of two parts:candidate triples generation and candidate triples ranking. Firstly, the candidate triples were generated by the BM25 algorithm calculating the correlation between the question and the triples in the knowledge base. Then, Multi-Feature Semantic Matching Model (MFSMM) was used to realize the ranking of the triples, which means the semantic similarity and character similarity were calculated by MFSMM through Bi-directional Long Short Term Memory Network (Bi-LSTM) and Convolutional Neural Network (CNN) respectively, and the triples were ranked by fusion. With NLPCC-ICCPOL 2016 KBQA as the dataset, the average F1 of the proposed system is 80.35%, which is close to the existing best performance.

Key words: Knowledge Base (KB), natural language question, triple, Multi-Feature Semantics Matching Model (MFSMM), semantic similarity, character similarity

中图分类号: