Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (7): 2010-2016.DOI: 10.11772/j.issn.1001-9081.2022071133

• The 39th CCF National Database Conference (NDBC 2022) • Previous Articles    

Prompt learning based unsupervised relation extraction model

Menglin HUANG1, Lei DUAN1,2(), Yuanhao ZHANG1, Peiyan WANG1, Renhao LI1   

  1. 1.College of Computer Science,Sichuan University,Chengdu Sichuan 610065,China
    2.Med -X Center for Informatics,Sichuan University,Chengdu Sichuan 610041,China
  • Received:2022-07-12 Revised:2022-08-11 Accepted:2022-08-22 Online:2022-09-23 Published:2023-07-10
  • Contact: Lei DUAN
  • About author:HUANG Menglin, born in 1998, M. S. candidate. Her research interests include natural language processing.
    DUAN Lei, born in 1981, Ph. D., professor. His research interests include data mining.
    ZHANG Yuanhao, born in 1999, M. S. candidate. His research interests include natural language processing.
    WANG Peiyan, born in 1997, M. S. candidate. Her research interests include knowledge graph completion.
    LI Renhao, born in 1997, M. S. candidate. His research interests include natural language processing.
  • Supported by:
    National Natural Science Foundation of China(61972268);Project of Med-X Center for Informatics, Sichuan University(YGJC001)

基于Prompt学习的无监督关系抽取模型

黄梦林1, 段磊1,2(), 张袁昊1, 王培妍1, 李仁昊1   

  1. 1.四川大学 计算机学院,成都 610065
    2.四川大学 “医学+信息”中心,成都 610041
  • 通讯作者: 段磊
  • 作者简介:黄梦林(1998—),女,重庆人,硕士研究生,CCF会员,主要研究方向:自然语言处理;
    段磊(1981—),男,四川成都人,教授,博士,CCF高级会员,主要研究方向:数据挖掘;
    张袁昊(1999—),男,安徽宁国人,硕士研究生,CCF会员,主要研究方向:自然语言处理;
    王培妍(1997—),女,吉林通化人,硕士研究生,CCF会员,主要研究方向:知识图谱补全;
    李仁昊(1997—),男,北京人,硕士研究生,CCF会员,主要研究方向:自然语言处理。
  • 基金资助:
    国家自然科学基金资助项目(61972268);四川大学“医学+信息”中心融合创新项目(YGJC001)

Abstract:

Unsupervised relation extraction aims to extract the semantic relations between entities from unlabeled natural language text. Currently, unsupervised relation extraction models based on Variational Auto-Encoder (VAE) architecture provide supervised signals to train model through reconstruction loss, which offers a new idea to complete unsupervised relation extraction tasks. Focusing on the issue that this kind of models cannot understand contextual information effectively and relies on dataset inductive biases, a Prompt-based learning based Unsupervised Relation Extraction (PURE) model was proposed, including a relation extraction module and a link prediction module. In the relation extraction module, a context-aware Prompt template function was designed to fuse the contextual information, and the unsupervised relation extraction task was converted into a mask prediction task, so as to make full use of the knowledge obtained during pre-training phase to extract relations. In the link prediction module, supervised signals were provided for the relation extraction module by predicting the missing entities in the triples to assist model training. Extensive experiments on two public real-world relation extraction datasets were carried out. The results show that PURE model can use contextual information effectively and does not rely on dataset inductive biases, and has the evaluation index B-cubed F1 improved by 3.3 percentage points on NYT dataset compared with the state-of-the-art VAE architecture-based model UREVA (Variational Autoencoder-based Unsupervised Relation Extraction model).

Key words: unsupervised relation extraction, Prompt learning, Variational Auto-Encoder (VAE), Pre-trained Language Model (PLM), unsupervised learning

摘要:

无监督关系抽取旨在从无标签的自然语言文本中抽取实体之间的语义关系。目前,基于变分自编码器(VAE)架构的无监督关系抽取模型通过重构损失提供监督信号来训练模型,这为完成无监督关系抽取任务提供了新思路。针对此类模型无法有效地理解上下文信息、依赖数据集归纳偏置的问题,提出基于Prompt学习的无监督关系抽取(PURE)模型,其中包括关系抽取和链接预测两个模块。在关系抽取模块中设计了上下文感知的Prompt模板函数以融入上下文信息,并将无监督关系抽取任务转换为掩码预测任务,从而充分利用预训练阶段获得的知识完成关系抽取。在链接预测模块中则通过预测关系三元组中的缺失实体提供监督信号联合训练两个模块。在两个公开真实关系抽取数据集上进行了大量实验,得到的结果表明PURE模型能有效利用上下文信息并且不依赖数据集归纳偏置,相较于目前最优的基于VAE架构的模型UREVA (Variational Autoencoder-based Unsupervised Relation Extraction model)在NYT数据集上的B-cubed F1指标上提升了3.3个百分点。

关键词: 无监督关系抽取, Prompt学习, 变分自编码器, 预训练语言模型, 无监督学习

CLC Number: