《计算机应用》唯一官方网站

• •    下一篇

结合实体边界偏移的序列标注优化方法

余婧1,陈艳平1,扈应1,黄瑞章2,秦永彬2   

  1. 1. 贵州大学计算机科学与技术学院
    2. 贵州大学
  • 收稿日期:2024-07-23 修回日期:2024-10-12 发布日期:2024-11-19 出版日期:2024-11-19
  • 通讯作者: 余婧
  • 基金资助:
    贵州省科学技术基金重点资助项目;国家重点研发计划;国家自然科学基金

Optimization method for sequence labeling combined with entity boundary offset

  • Received:2024-07-23 Revised:2024-10-12 Online:2024-11-19 Published:2024-11-19

摘要: 针对序列标注模型在命名实体识别任务中出现识别的实体边界与真实的实体边界之间存在位置偏差的问题,提出了一种结合实体边界偏移的序列标注优化方法。首先,该方法引入边界偏移量的概念来量化每个词与实体边界之间的位置关系,计算每个词与最近实体边界的相对偏移量,并利用这些偏移量生成实体边界的候选跨度。随后,利用交并比作为筛选标准,过滤低质量的候选跨度,保留最有可能代表实体边界的候选跨度。最后,通过边界调整模块,根据候选跨度更新标签序列中实体边界的位置,从而优化整个标签序列的实体边界,提升实体识别的性能。所提方法在数据集CLUENER2020、Resume-zh和MSRA上的F1值分别达到了80.48%、96.42%和94.80%,验证了该方法对命名实体识别任务的有效性。

关键词: 命名实体识别, 序列标注, 边界偏移, 交并比, 边界调整

Abstract: To address the issue of positional deviation between the predicted entity boundaries and the true entity boundaries in sequence labeling models for Named Entity Recognition (NER), a sequence labeling optimization method combined entity boundary offset was proposed. Firstly, the concept of boundary offset was introduced to quantify the positional relationship between each word and the entity boundaries. The relative offset between each word to the nearest entity boundary was calculated, and these offsets are used to generate candidate spans for the entity boundaries. Subsequently, the Intersection-over-Union (IoU) was used as a filtering criterion to exclude low-quality candidate spans, retaining those most likely to represent the entity boundary. Finally, the boundary adjustment module was used to update the positions of entity boundaries in the label sequence based on the candidate spans, thereby optimizing the entity boundaries in the entire label sequence and improving the performance of entity recognition. The method achieves F1-scores of 80.48%, 96.42%, and 94.80% on the CLUENER2020, Resume-zh, and MSRA datasets, respectively, validating its effectiveness for the named entity recognition task.

Key words: named entity recognition, sequence labeling, boundary offset, intersection-over-union, boundary adjustment

中图分类号: