《计算机应用》唯一官方网站 ›› 2025, Vol. 45 ›› Issue (11): 3547-3554.DOI: 10.11772/j.issn.1001-9081.2024111606

• 人工智能 • 上一篇    

多方向梯度特征提取的嵌套命名实体识别方法

王晓曼1,2,3, 陈艳平1,2,3(), 杨采薇1,2,3, 黄瑞章1,2,3, 秦永彬1,2,3   

  1. 1.贵州大学 文本计算与认知智能教育部工程研究中心,贵阳 550025
    2.公共大数据国家重点实验室(贵州大学),贵阳 550025
    3.贵州大学 计算机科学与技术学院,贵阳 550025
  • 收稿日期:2024-11-14 修回日期:2025-02-12 接受日期:2025-02-17 发布日期:2025-04-02 出版日期:2025-11-10
  • 通讯作者: 陈艳平
  • 作者简介:王晓曼(1999—),女,山西太原人,硕士研究生,CCF会员,主要研究方向:自然语言处理、信息抽取
    杨采薇(1997—),女,贵州遵义人,博士研究生,主要研究方向:自然语言处理
    黄瑞章(1979—),女,天津人,教授,博士,CCF会员,主要研究方向:数据融合分析、文本挖掘、网络挖掘、知识发现
    秦永彬(1980—),男,山东烟台人,教授,博士,CCF高级会员,主要研究方向:大数据治理与应用、多源数据融合。
  • 基金资助:
    国家重点研发计划项目(2023YFC3304500);国家自然科学基金资助项目(62166007);贵州省科学技术基金重点资助项目([2024]003)

Nested named entity recognition method for multi-directional gradient feature extraction

Xiaoman WANG1,2,3, Yanping CHEN1,2,3(), Caiwei YANG1,2,3, Ruizhang HUANG1,2,3, Yongbin QIN1,2,3   

  1. 1.Text Computing and Cognitive Intelligence Engineering Research Center of Ministry of Education,Guizhou University,Guiyang Guizhou 550025,China
    2.State Key Laboratory of Public Big Data (Guizhou University),Guiyang Guizhou 550025,China
    3.College of Computer Science and Technology,Guizhou University,Guiyang Guizhou 550025,China
  • Received:2024-11-14 Revised:2025-02-12 Accepted:2025-02-17 Online:2025-04-02 Published:2025-11-10
  • Contact: Yanping CHEN
  • About author:WANG Xiaoman, born in 1999, M. S. candidate. Her research interests include natural language processing, information extraction.
    YANG Caiwei, born in 1997, Ph. D. candidate. Her research interests include natural language processing.
    HUANG Ruizhang, born in 1979, Ph. D., professor. Her research interests include data fusion and analysis, text mining, network mining, knowledge discovery.
    QIN Yongbin, born in 1980, Ph. D., professor. His research interests include big data governance and application, multi-source data fusion.
  • Supported by:
    National Key Research and Development Program of China(2023YFC3304500);National Natural Science Foundation of China(62166007);Major Science and Technology Projects of Guizhou Province([2024]003)

摘要:

嵌套命名实体识别(NER)是自然语言处理中的一个基本任务。基于跨度的方法将实体识别视为一个跨度分类任务,可以有效地处理嵌套实体。现有方法将句子中的跨度组织成一个二维平面,其中每个单元代表一个跨度,类似于图像中的像素点;随后结合图像处理中的边缘检测技术,利用梯度算子强化并提取平面化句子表示中的实体语义边缘特征。然而,现有基于梯度算子的工作忽略了相邻跨度之间的多方向边缘特征。针对该问题,提出一种多方向梯度特征提取的NER方法。该方法将实体所在位置视为图像中的像素点,利用边缘具有梯度的性质,在平面化句子中采用八方向Sobel算子提取更加完整且具有区分度的实体语义边缘特征。该方法在ACE 2005中文数据集和GENIA英文数据集上分别取得了88.01%和81.23%的F1值,验证了它对NER任务的有效性;同时,在CoNLL2003英文扁平数据集上也取得了92.52%的F1值,验证了它的可扩展性。

关键词: 嵌套命名实体识别, 平面化句子表示, 多方向梯度, 特征提取, Sobel算子

Abstract:

Nested Named Entity Recognition (NER) is a fundamental task in natural language processing. Span-based methods treat entity recognition as a span classification task, effectively handling nested entities. Existing methods organize sentence spans into a two-dimensional plane, where each unit represents a span, similar to pixels in an image. Subsequently, edge detection techniques from image processing are employed to enhance and extract semantic edge features of entities in planarized sentence representation by using gradient operators. However, existing gradient operator-based approaches neglect multi-directional edge features between adjacent spans. To address this limitation, a nested NER method for multi-directional gradient feature extraction was proposed. This method treated entity positions as pixels in an image. Leveraging the gradient properties of edges, an eight-direction Sobel operator was employed to extract more comprehensive and discriminative semantic edge features in the planarized sentence representation. The proposed method achieves F1 scores of 88.01% and 81.23% on the ACE 2005 Chinese dataset and the GENIA English dataset, respectively, demonstrating its effectiveness for nested NER tasks. Additionally, it also achieves F1 score of 92.52% on the CoNLL2003 English flat dataset, validating its scalability.

Key words: nested named entity recognition, planarized sentence representation, multi-directional gradient, feature extraction, Sobel operator

中图分类号: