Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (9): 2798-2805.DOI: 10.11772/j.issn.1001-9081.2024081159

• Artificial intelligence • Previous Articles    

Nested named entity recognition model for wind power equipment based on differential boundary enhancement

Dengran REN, Shuying WANG()   

  1. School of Computing and Artificial Intelligence,Southwest Jiaotong University,Chengdu Sichuan 611756,China
  • Received:2024-08-16 Revised:2024-10-05 Accepted:2024-10-16 Online:2024-11-07 Published:2025-09-10
  • Contact: Shuying WANG
  • About author:REN Dengran, born in 1999, M. S. candidate. His research interests include natural language processing, knowledge graph.
  • Supported by:
    National Key Research and Development Program of China(2022YFC3005200);Major Science and Technology Special Project of Sichuan Province(2022ZDZX0003)

基于差分边界增强的风电装备嵌套命名实体识别模型

任登燃, 王淑营()   

  1. 西南交通大学 计算机与人工智能学院,成都 611756
  • 通讯作者: 王淑营
  • 作者简介:任登燃(1999—),男,四川达州人,硕士研究生,主要研究方向:自然语言处理、知识图谱
  • 基金资助:
    国家重点研发计划项目(2022YFC3005200);四川省重大科技专项(2022ZDZX0003)

Abstract:

Due to high nesting of entities and the characteristics of long texts in the field of wind power, a nested Named Entity Recognition model based on Differential Boundary Enhancement (DBE-NER) was proposed. Firstly, a semantic encoder module was used to obtain feature representations fusing entity’s head and tail words, entity types, and relative distances, thereby enhancing the model’s ability to capture nested semantic features. Secondly, an efficient differential semantic encoding module was designed to solve the fuzziness problem of nested entity boundaries. Thirdly, a Grouped Dilated Attention Network (GDAN) was utilized to improve the model’s effectiveness in recognizing long-text entities, nested entities, and nested boundaries. Finally, the feature score matrix was input into a span decoder to obtain positions and categories of the entities. Experimental results indicate that the F1 score of DBE-NER is improved by 0.92% and 1.07% compared to those of DiFiNet (Differentiation and Filtration Network) and CNN-NER (Convolutional Neural Network for Named Entity Recognition) models on a manually annotated dataset from a large wind power energy enterprise — WPEF dataset, and the F1 scores of DBE-NER are also increased on various public datasets.

Key words: wind power energy equipment, Named Entity Recognition (NER), differential semantic encoding, multi-head biaffine encoder, span, Convolutional Block Attention Module (CBAM)

摘要:

针对风电装备领域中实体的高度嵌套性和长文本的特性,提出一种基于差分边界增强的嵌套命名实体识别模型(DBE-NER)。首先,通过语义编码器模块获取融合实体头尾词、实体类型和相对距离的特征表示,从而提升模型对嵌套语义特征的捕捉能力;其次,设计一种高效的差分语义编码模块解决嵌套实体边界的模糊问题;再次,使用分组空洞注意力网络(GDAN)提高模型在长文本实体、嵌套实体和嵌套边界的识别效果;最后,将特征分数矩阵输入跨度解码器中以得到实体位置和类别。实验结果表明,与DiFiNet(Differentiation and Filtration Network)和CNN-NER(Convolutional Neural Network for Named Entity Recognition)模型相比,DBE-NER的F1分数在人工标注的某大型风电能源企业故障数据集WPEF上分别提升了0.92%和1.07%,并且在多种公开数据集上的F1分数均有所提高。

关键词: 风电能源装备, 命名实体识别, 差分语义编码, 多头双仿射编码, 跨度, CBAM

CLC Number: