计算机应用 ›› 2019, Vol. 39 ›› Issue (7): 1918-1924.DOI: 10.11772/j.issn.1001-9081.2019010182

• 人工智能 • 上一篇    下一篇

基于强化学习的实体关系联合抽取模型

陈佳沣, 滕冲   

  1. 武汉大学 国家网络安全学院, 武汉 430072
  • 收稿日期:2019-01-24 修回日期:2019-03-21 发布日期:2019-04-15 出版日期:2019-07-10
  • 通讯作者: 滕冲
  • 作者简介:陈佳沣(1995-),男,湖北武汉人,硕士研究生,主要研究方向:数据挖掘、深度学习;滕冲(1974-),女,湖北武汉人,副教授,博士,主要研究方向:数据挖掘、大数据、深度学习、舆情分析。
  • 基金资助:

    国家自然科学基金面上项目(61772378)。

Joint entity and relation extraction model based on reinforcement learning

CHEN Jiafeng, TENG Chong   

  1. School of Cyber Science and Engineering, Wuhan University, Wuhan Hubei 430072, China
  • Received:2019-01-24 Revised:2019-03-21 Online:2019-04-15 Published:2019-07-10
  • Supported by:

    This work is partially supported by the Surface Program of National Natural Science Foundation of China (61772378).

摘要:

针对现有的基于远程监督的实体和关系抽取方法存在着标签噪声问题,提出了一种基于强化学习的实体关系联合抽取方法。该模型有两个模块:句子选择器模块和实体关系联合抽取模块。首先,句子选择器模块选择没有标签噪声的高质量句子,将所选句子输入到实体关系联合抽取模型;然后,实体关系联合抽取模块采用序列标注方法对输入的句子进行预测,并向句子选择器模块提供反馈,指导句子选择器模块挑选高质量的句子;最后,句子选择器模块和实体关系联合抽取模块同时训练,将句子选择与序列标注一起优化。实验结果表明,该模型在实体关系联合抽取中的F1值为47.3%,与CoType为代表的联合抽取模型相比,所提模型的F1值提升了1%;与LINE为代表的串行模型相比,所提模型的F1值提升了14%。结果表明强化学习结合实体关系联合抽取模型能够有效地提高序列标注模型的F1值,其中句子选择器能有效地处理数据的噪声。

关键词: 强化学习, 联合抽取, 序列标注, 命名实体识别, 关系分类

Abstract:

Existing entity and relation extraction methods that rely on distant supervision suffer from noisy labeling problem. A model for joint entity and relation extraction from noisy data based on reinforcement learning was proposed to reduce the impact of noise data. There were two modules in the model:an sentence selector module and a sequence labeling module. Firstly, high-quality sentences without labeling noise were selected by instance selector module and the selected sentences were input into sequence labeling module. Secondly, predictions were made by sequence labeling module and the rewards were provided to sentence selector module to help the module select high-quality sentences. Finally, two modules were trained jointly to optimize instance selection and sequence labeling processes. The experimental results show that the F1 value of the proposed model is 47.3% in the joint entity and relation extraction, which is 1% higher than those of joint extraction models represented by CoType and 14% higher than those of serial models represented by LINE(Large-scale Information Network Embedding). The results show that the joint entity and relation extraction model in combination with reinforcement learning can effectively improve F1 value of sequential labeling model, in which the sentence selector can effectively deal with the noise of data.

Key words: reinforcement learning, joint extraction, sequence tagging, named entity recognition, relation classification

中图分类号: