《计算机应用》唯一官方网站

• •    下一篇

多方向梯度特征提取的嵌套命名实体识别方法

王晓曼1,陈艳平2,杨采薇3,黄瑞章1,秦永彬2   

  1. 1. 贵州大学
    2. 贵州大学计算机科学与技术学院
    3. 贵州大学文本计算与认知智能教育部工程研究中心
  • 收稿日期:2024-11-14 修回日期:2025-02-12 接受日期:2025-02-17 发布日期:2025-04-02 出版日期:2025-04-02
  • 通讯作者: 陈艳平
  • 基金资助:
    国家重点研发计划;国家自然科学基金;贵州省科学技术基金重点资助项目

Nested Named Entity Recognition Method for Multi directional Gradient Feature Extraction

  • Received:2024-11-14 Revised:2025-02-12 Accepted:2025-02-17 Online:2025-04-02 Published:2025-04-02

摘要: 摘 要: 嵌套命名实体识别是自然语言处理中一个基本任务。基于跨度的方法将实体识别视为一个跨度分类任务,可以有效地处理嵌套实体。现有方法将句子中的跨度组织成一个二维平面,其中每个单元代表一个跨度,类似于图像中的像素点;随后结合图像处理中的边缘检测技术,利用梯度算子强化并提取平面化句子表示中的实体语义边缘特征。然而,现有基于梯度算子的工作忽略了相邻跨度之间的多方向边缘特征。针对该问题,提出一种多方向梯度特征提取的嵌套命名实体识别方法。该方法采用八方向Sobel算子,将实体所在位置视为图像中的像素点,利用边缘具有梯度的性质,在平面化句子中使用此算子来提取更加完整且具有区分度的实体语义边缘特征。该方法在ACE2005中文数据集和GENIA英文数据集上分别取得了88.01%和81.23%的F1值,验证了它对嵌套命名实体识别任务的有效性;同时,在CoNLL2003英文扁平数据集上进行了评估,取得了92.52%的F1值,验证了本文方法的可扩展性。

关键词: 嵌套命名实体识别, 平面化句子表示, 多方向梯度, 特征提取, Sobel算子

Abstract: Abstract: Nested named entity recognition was regarded as a fundamental task in natural language processing. Span-based methods treated entity recognition as a span classification task, which effectively handled nested entities. Existing methods organized spans in a sentence into a two-dimensional plane, where each unit represented a span, similar to pixels in an image. Subsequently, edge detection techniques from image processing were integrated, and gradient operators were utilized to enhance and extract semantic edge features of entities in the planar sentence representation. However, existing gradient-based approaches overlooked multi-directional edge features between adjacent spans. To address the limitation, a nested named entity recognition method incorporating multi-directional gradient feature extraction was proposed. An eight-direction Sobel operator was employed, with entity positions treated as pixels in an image and the gradient properties of edges leveraged to extract more comprehensive and discriminative semantic edge features in the planar sentence representation. The proposed method achieves F1 scores of 88.01% and 81.23% on the ACE2005 Chinese dataset and the GENIA English dataset, respectively, demonstrating its effectiveness for nested named entity recognition tasks. Additionally, evaluation on the CoNLL2003 English flat dataset resulted in an F1 score of 92.52%, validating the scalability of the approach.

Key words: nested named entity recognition, planarized the sentence representation, multi directional gradient, feature extraction, Sobel operator

中图分类号: