计算机应用 ›› 2018, Vol. 38 ›› Issue (7): 1846-1852.DOI: 10.11772/j.issn.1001-9081.2018010186

• 人工智能 • 上一篇    下一篇

基于迁移学习的知识图谱问答语义匹配模型

鲁强, 刘兴昱   

  1. 石油数据挖掘北京市重点实验室(中国石油大学(北京)), 北京 102249
  • 收稿日期:2018-01-22 修回日期:2018-03-16 出版日期:2018-07-10 发布日期:2018-07-12
  • 通讯作者: 鲁强
  • 作者简介:鲁强(1977-),男,河北唐山人,副教授,博士,CCF会员,主要研究方向:知识工程、演化计算;刘兴昱(1992-),男,河北廊坊人,硕士研究生,主要研究方向:自然语言处理。
  • 基金资助:
    国家自然科学基金资助项目(61402532);中国石油大学(北京)青年基础科研基金资助项目(01JB0415)。

Semantic matching model of knowledge graph in question answering system based on transfer learning

LU Qiang, LIU Xingyu   

  1. Beijing Key Laboratory of Petroleum Data Mining(China University of Petroleum-Beijing), Beijing 102249, China
  • Received:2018-01-22 Revised:2018-03-16 Online:2018-07-10 Published:2018-07-12
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61402532), the Science Foundation for Youth Basic Research of China University of Petroleum-Beijing (01JB0415).

摘要: 针对单一事实类问答系统中问句和关系的语义匹配在小规模标注样本中难以获得较高准确率的问题,提出一种基于循环神经网络(RNN)的迁移学习模型。首先,使用基于RNN的序列到序列无监督学习算法,通过序列重构的方式在大量无标注样本中学习问句的语义空间分布,即词向量和RNN;然后,通过给神经网络参数赋值的方式,使用此语义空间分布作为有监督语义匹配算法的参数;最后,通过使用问句特征和关系特征计算内积的方式,在有标注样本中训练并生成语义匹配模型。实验结果表明,在有标注数据量较少而无标注数据量较大的环境下,与有监督学习方法Embed-AVG和RNNrandom相比,所提模型的语义匹配准确率分别平均提高5.6和8.8个百分点。所提模型通过预学习大量无标注样本的语义空间分布可以明显提高在小规模标注样本环境下的语义匹配准确率。

关键词: 语义匹配, 迁移学习, 知识图谱, 问答系统, 循环神经网络

Abstract: To solve the problem that semantic matching between questions and relations in a single fact-based question answering system is difficult to obtain high accuracy in small-scale labeled samples, a transfer learning model based on Recurrent Neural Network (RNN) was proposed. Firstly, by the way of reconstructing sequences, an RNN-based sequence-to-sequence unsupervised learning algorithm was used to learn the semantic distribution (word vector and RNN) of questions in a large number of unlabeled samples. Then, by assigning values to the parameters of a neural network, the semantic distribution was used as the parameters of the supervised semantic matching algorithm. Finally, by the inner product of the question features and relation features, the semantic matching model was trained and generated in labeled samples. The experimental results show that compared with the supervised learning method Embed-AVG and RNNrandom, the accuracy of semantic matching of the proposed model is averagely increased by 5.6 and 8.8 percentage points respectively in an environment with a small number of labeled samples and a large number of unlabeled samples. The proposed model can significantly improve the accuracy of semantic matching in an environment with labeled samples by pre-learning the semantic distribution of a large number of unlabeled samples.

Key words: semantic matching, transfer learning, knowledge graph, question answering system, Recurrent Neural Network (RNN)

中图分类号: