Journal of Computer Applications
Next Articles
Received:
Revised:
Online:
Published:
Supported by:
任登燃1,王淑营2
通讯作者:
基金资助:
Abstract: Abstract: The high nesting of entities and the characteristics of long texts in the field of wind power equipment were addressed by proposing a nested named entity recognition model based on differential boundary enhancement (DBE-NER). Firstly, a semantic encoder module was used to obtain feature representations that fused the entity's head and tail words, entity types, and relative distances, enhancing the model's ability to capture nested semantic features. Secondly, an efficient differential semantic encoding module was designed to resolve the ambiguity of nested entity boundaries. Furthermore, a grouped dilated attention network was utilized to improve the model's effectiveness in recognizing long-text entities, nested entities, and nested boundaries. Finally, the feature score matrix was input into a span decoder to obtain the positions and categories of entities. Experimental results indicated that the F1 score of DBE-NER improved by 0.92% and 1.07% compared to the DIFINET (Boundary-Aware Semantic Differentiation and Filtration Network) and CNN-NER (Convolutional Neural Network for Nested Named Entity Recognition) models on a manually annotated dataset of fault data from a large wind power energy enterprise (WPEF). The F1 score also increased on various public datasets.
Key words: wind energy equipment, nested name entity recognition, differential semantic encoder, muti-head biaffine encoder, span, convolutional block attention module
摘要: 摘 要: 针对风电装备领域中实体的高度嵌套性和长文本的特性,本文提出了一种差分边界增强嵌套命名实体识别模型(DBE-NER),旨在有效识别嵌套实体和处理文本的多样特性。首先,通过语义编码器模块来获取融合实体头尾词、实体类型和相对距离的特征表示,提升模型对嵌套语义特征的捕捉能力;其次,设计了一种高效的差分语义编码模块,有效解决嵌套实体边界的模糊问题;最后,提出基于分组空洞注意力网络来提高模型在长文本实体、嵌套实体和嵌套边界的识别效果。实验结果显示,模型在多个数据集上的表现优于近两年现有技术,在ACE2004和ACE2005数据集上,DBE-NER模型的F1得分分别达到87.88%和87.12%,均优于基准模型,达到了当前技术的领先水平。特别在Genia数据集上,DBE-NER的F1得分达到80.79%,准确率提升了0.32%,召回率提升了0.41%。此外,在人工标注的某大型风电能源企业故障数据集WPEF上,DBE-NER的F1得分达到87.01%,准确率提升了3.24%。这些结果表明,DBE-NER在各类数据集上的准确率和召回率均表现出显著提升,证明了其在处理复杂嵌套实体和长文本任务中的优越性。针对风电装备领域中实体的高度嵌套性和长文本的特性,提出一种基于差分边界增强的嵌套命名实体识别模型(DBE-NER)。首先,通过语义编码器模块来获取融合实体头尾词、实体类型和相对距离的特征表示,提升模型对嵌套语义特征的捕捉能力;其次,输入到设计的一种高效的差分语义编码模块来解决嵌套实体边界的模糊问题;再次,基于分组空洞注意力网络来提高模型在长文本实体、嵌套实体和嵌套边界的识别效果;最后,将特征分数矩阵输入到跨度解码器中得到实体位置和类别。实验结果表明,DBE-NER与DIFINET(Boundary-Aware Semantic Differentiation and Filtration Network)和CNN-NER(Convolutional Neural Network for Nested Named Entity Recognition)模型相比,F1得分在人工标注的某大型风电能源企业故障数据集WPEF上,分别提升了0.92%和1.07% ,并且在多种公开数据集上的F1得分均有所提高。
关键词: 风电能源装备, 嵌套命名实体识别, 差分语义编码, 多头双仿射编码, 跨度, 卷积注意力机制
CLC Number:
TP391.1
任登燃 王淑营. 基于差分边界增强的风电装备嵌套实体识别模型[J]. 《计算机应用》唯一官方网站, DOI: 10.11772/j.issn.1001-9081.2024081159.
0 / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2024081159