Journal of Computer Applications
Next Articles
Received:
Revised:
Accepted:
Online:
Published:
范锦涛1,2,3,陈艳平1,2,3,杨采薇1,2,3,林川1,2,3*
通讯作者:
基金资助:
Abstract: To address the following two major drawbacks of existing Contrastive Learning (CL) method in the nested Named Entity Recognition (NER) task: 1) Greedily enumerating candidate entities for CL lacked contextual semantics and boundary information. 2) It generated unnecessary noise and invalid information, increasing computational burden and weakening CL performance. A two-stage named entity recognition framework was proposed. In the first stage, candidate entity boundaries were generated by the boundary recognition model and integrated by the boundary integration module to minimize unnecessary negative candidates. Attention cues were inserted on both sides of the candidate entities to generate corresponding candidate entity text, allowing the model to perceive contextual semantics and boundary information. In the second stage, a bi-encoder framework mapped candidate entity texts and entity label annotations into the same vector representation space through CL, with the comparison object being sentences with attention cues rather than candidate entities. Compared with the Binder method, the proposed method improves the performance on three nested datasets, GENIA, ACE2005 and ACE2004, with F1 values of 1.22%, 3.42% and 2.31%, which verifies the effectiveness of the proposed method for task of nested named entity recognition
Key words: contrastive learning, boundary information, bi-encoder, label semantics, nested named entity
摘要: 针对现有对比学习(CL)方法在嵌套命名实体识别(NER)任务中存在以下两个主要缺点:1)枚举生成的候选实体作为对比学习的对象,缺失上下文语义依赖和边界信息;2)产生不必要的噪音和无效信息,增加模型的计算负担并弱化了对比学习的性能。提出两阶段命名实体识别框架。在第一阶段,通过边界识别模型生成候选实体边界,并通过边界集成模块生成候选实体,减少不必要的负候选实体的生成;同时,在候选实体两侧插入注意力线索,生成对应的候选实体文本,使得模型能够感知上下文语义和边界信息。在第二阶段,提出一个双编码框架用于识别实体,通过对比学习将候选实体文本和实体类型注释映射到相同向量表征空间中,对比的对象不再是候选实体,而是带有注意力线索的句子。此外,设计带有标签语义的分类参数矩阵,丰富模型对候选实体的理解能力。与Binder方法相比,所提方法在GENIA、ACE2005和ACE2004三种嵌套数据集上,性能提升了1.22个百分点,3.42个百分点,2.31个百分点的F1值,验证了所提方法对嵌套命名实体识别任务的有效性。
关键词: 对比学习, 边界信息, 双编码器, 标签语义, 嵌套命名实体识别
CLC Number:
TP391.1','1');return false;" target="_blank"> TP391.1
范锦涛 陈艳平 杨采薇 林川. 结合边界信息的对比学习嵌套命名实体识别[J]. 《计算机应用》唯一官方网站, DOI: 10.11772/j.issn.1001-9081.2024101525.
0 / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2024101525