Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (6): 1796-1801.DOI: 10.11772/j.issn.1001-9081.2021091747

• The 18th CCF Conference on Web Information Systems and Applications • Previous Articles    

Relation extraction method based on entity boundary combination

Hao LI1,2, Yanping CHEN1,2(), Ruixue TANG1,2, Ruizhang HUANG1,2, Yongbin QIN1,2, Guorong WANG1,2, Xi TAN3   

  1. 1.College of Computer Science and Technology,Guizhou University,Guiyang Guizhou 550025,China
    2.State Key Laboratory of Public Big Data (Guizhou University),Guiyang Guizhou 550025,China
    3.Guizhou Qingduo Technology Company Limited,Guiyang Guizhou 550025,China
  • Received:2021-10-12 Revised:2021-11-11 Accepted:2021-11-17 Online:2022-04-15 Published:2022-06-10
  • Contact: Yanping CHEN
  • About author:LI Hao, born in 1996, M. S. candidate. His research interests include natural language processing, relation extraction.
    TANG Ruixue, born in 1987, Ph. D. candidate. Her research interests include natural language processing.
    HUANG Ruizhang,born in 1979,Ph. D.,professor,Her research interests include data mining, text mining, machine learning,information retrieval.
    QIN Yongbin, born in 1980, Ph. D., professor, His research interests include intelligent computing, machine learning, algorithm design.
    WANG Guorong, born in 1995, Ph. D. candidate. Her research interests include natural language processing.
  • Supported by:
    National Natural Science Foundation of China(62066008);Key Project of Science and Technology Foundation of Guizhou Province (Qianke Hejichu [2020] 1Z055)

基于实体边界组合的关系抽取方法

李昊1,2, 陈艳平1,2(), 唐瑞雪1,2, 黄瑞章1,2, 秦永彬1,2, 王国蓉1,2, 谭曦3   

  1. 1.贵州大学 计算机科学与技术学院, 贵阳 550025
    2.公共大数据国家重点实验室(贵州大学), 贵阳 550025
    3.贵州青朵科技有限公司, 贵阳 550025
  • 通讯作者: 陈艳平
  • 作者简介:李昊(1996—),男,四川成都人,硕士研究生,CCF会员,主要研究方向:自然语言处理、关系抽取
    唐瑞雪(1987—),女,贵州贵阳人,博士研究生,主要研究方向:自然语言处理
    黄瑞章(1979—),女,天津人,教授,博士,CCF会员,主要研究方向:数据挖掘、文本挖掘、机器学习、信息检索
    秦永彬(1980—),男,山东招远人,教授,博士,CCF会员,主要研究方向:智能计算、机器学习、算法设计
    王国蓉(1995—),女,贵州瓮安人,博士研究生,主要研究方向:自然语言处理。
  • 基金资助:
    国家自然科学基金资助项目(62066008);贵州省科学技术基金重点项目(黔科合基础[2020]1Z055)

Abstract:

Relation extraction aims to extract the semantic relationships between entities from the text. As the upper-level task of relation extraction, entity recognition will generate errors and spread them to relation extraction, resulting in cascading errors. Compared with entities, entity boundaries have small granularity and ambiguity, making them easier to recognize. Therefore, a relationship extraction method based on entity boundary combination was proposed to realize relation extraction by skipping the entity and combining the entity boundaries in pairs. Since the boundary performance is higher than the entity performance, the problem of error propagation was alleviated; in addition, the performance was further improved by adding the type features and location features of entities through the feature combination method, which reduced the impact caused by error propagation. Experimental results on ACE 2005 English dataset show that the proposed method outperforms the table-sequence encoders method by 8.61 percentage points on Macro average F1-score.

Key words: relation extraction, entity recognition, cascading error, entity boundary combination, feature combination

摘要:

关系抽取旨在从文本中抽取实体与实体之间的语义关系。作为关系抽取的上层任务,实体识别所产生的错误将扩散至关系抽取,从而导致级联错误。与实体相比,实体边界粒度小且具有二义性,更易识别。因此,提出一种基于实体边界组合的关系抽取方法,通过跳过实体,对实体边界两两组合来进行关系抽取。由于边界性能高于实体性能,所以错误扩散的问题得到了缓解;并且通过特征组合的方法将实体类型特征和位置特征加入模型中,性能得到了进一步提高,再次减轻了错误扩散带来的影响。实验结果表明,所提方法在ACE 2005英文数据集的宏平均F1值优于表格-序列编码器方法8.61个百分点。

关键词: 关系抽取, 实体识别, 级联错误, 实体边界组合, 特征组合

CLC Number: