计算机应用 ›› 2017, Vol. 37 ›› Issue (6): 1741-1746.DOI: 10.11772/j.issn.1001-9081.2017.06.1741

• 人工智能 • 上一篇    下一篇

面向阅读理解的句子组合模型

王元龙   

  1. 山西大学 计算机与信息技术学院, 太原 030006
  • 收稿日期:2016-11-21 修回日期:2017-02-06 出版日期:2017-06-10 发布日期:2017-06-14
  • 通讯作者: 王元龙
  • 作者简介:王元龙(1983-),男,山西大同人,讲师,博士,CCF会员,主要研究方向:虚拟现实、自然语言处理、高性能计算。
  • 基金资助:
    国家863计划项目(2015AA015407);山西省自然科学基金资助项目(201601D102030)。

Sentence composition model for reading comprehension

WANG Yuanlong   

  1. School of Computer and Information Technology, Shanxi University, Taiyuan Shanxi 030006, China
  • Received:2016-11-21 Revised:2017-02-06 Online:2017-06-10 Published:2017-06-14
  • Supported by:
    This work is partially supported by the National High Technology Research and Development Program (863 Program) of China (2015AA015407), the Natural Science Foundation of Shanxi Province (201601D102030).

摘要: 阅读理解任务需要综合运用文本的表示、理解、推理等自然语言处理技术。针对高考语文中文学作品阅读理解的选项题问题,提出了基于分层组合模式的句子组合模型,用来实现句子级的语义一致性计算。首先,通过单个词和短语向量组成的三元组来训练一个神经网络模型;然后,通过训练好的神经网络模型来组合句子向量(两种组合方法:一种为递归方法;另一种为循环方法),得到句子的分布式向量表示。句子间的一致性利用两个句子向量之间的余弦相似度来表示。为了验证所提方法,收集了769篇模拟材料+13篇北京高考语文试卷材料(包括原文与选择题)作为测试集。实验结果表明,与传统最优的基于知网语义方法相比,循环方法准确率在高考材料中提高了7.8个百分点,在模拟材料中提高了2.7个百分点。

关键词: 自然语言理解, 句子组合模型, 阅读理解, 语义相似度计算

Abstract: The reading comprehension of document in Natural Language Processing (NLP) requires the technologies such as representation, understanding and reasoning on the document. Aiming at the choice questions of literature reading comprehension in college entrance examination, a sentence composition model based on the hierarchical composition model was proposed, which could achieve the semantic consistency measure at the sentence level. Firstly, a neural network model was trained by the triple consisted of single word and phrase vector. Then, the sentence vectors were combined by the trained neural network model (two composition methods:the recursion method and the recurrent method) to obtain the distributed vector of sentence. The similarity between sentences was presented by the cosine similarity between the two sentence vectors. In order to verify the proposed method, the 769 simulation materials and 13 Beijing college entrance examination materials (including the source text and the choice question) were collected as the test set. The experimental results show that, compared with the traditional optimal method based on HowNet semantics, the precision of the proposed recurrent method is improved by 7.8 percentage points in college entrance examination materials and 2.7 percentage points in simulation materials respectively.

Key words: natural language comprehension, sentence composition model, reading comprehension, semantic similarity computation

中图分类号: