《计算机应用》唯一官方网站 ›› 0, Vol. ›› Issue (): 66-71.DOI: 10.11772/j.issn.1001-9081.2024030311

• 人工智能 • 上一篇    下一篇

基于Transformer和关键词信息聚合的电力科研成果命名实体识别

徐晓轶(), 毛艳芳, 吕晓祥   

  1. 国网江苏省电力有限公司 南通供电分公司,江苏 南通 226300
  • 收稿日期:2024-03-25 修回日期:2024-05-27 接受日期:2024-06-04 发布日期:2025-01-24 出版日期:2024-12-31
  • 通讯作者: 徐晓轶
  • 作者简介:徐晓轶(1984—),男,江苏南通人,高级工程师,硕士,主要研究方向:电气工程
    毛艳芳(1989—),女,湖北孝感人,高级工程师,硕士,主要研究方向:电气工程
    吕晓祥(1989—),男,江苏南通人,高级工程师,硕士,主要研究方向:电气工程。
  • 基金资助:
    国网江苏省电力有限公司科技项目(J2023051)

Named entity recognition of electric power research results based on Transformer and keyword information aggregation

Xiaozhi XU(), Yanfang MAO, Xiaoxiang LYU   

  1. Nantong Power Supply Branch,State Grid Jiangsu Electric Power Company Limited,Nantong Jiangsu 226300,China
  • Received:2024-03-25 Revised:2024-05-27 Accepted:2024-06-04 Online:2025-01-24 Published:2024-12-31
  • Contact: Xiaozhi XU

摘要:

在电力领域科研活动中产生的科研成果,如论文与专利,蕴含丰富的信息,然而对于电力文本的命名实体识别(NER)研究较少。因此,构建了一个能够有效识别中文电力文本中命名实体的模型,并验证了它的性能和有效性。首先,爬取电力文献关键词后对它们进行预处理和整理,并构建电力领域的命名实体词库;其次,结合分词技术,对获取的电力领域文献摘要进行命名实体标注,并生成电力领域的命名实体标注语料数据。为了增强模型的表示能力和语义理解能力,在BiLSTM-CRF模型中引入了Transformer编码器机制。为了提升模型在电力垂直领域的适应性,构建了电力科研关键词与字之间的知识图谱,并基于该图谱得到了融合邻居信息的每个字的邻域矩阵,之后得到了融合关键词与字的知识图谱实体的邻居信息向量。通过构建双分支的词嵌入向量输入层,能够获得包含上下文信息和综合关键词邻居信息的词嵌入向量。实验结果表明,所提模型在电力领域表现出良好的识别效果。

关键词: 知识抽取, 电力工程, Transformer, 多头注意力机制, 命名实体识别

Abstract:

The scientific research achievements generated in the field of electricity, such as papers and patents, contain rich information. However, the research on Named Entity Recognition (NER) of electric power text is insufficient. Therefore, a model that can recognize named entities in Chinese electric power text effectively was constructed, and its performance and effectiveness were verified. Firstly, the keywords of electric power literature were crawled for preprocessing and arranging, and a thesaurus of named entities in the electric power field was constructed. Secondly, combining with word segmentation technology, the acquired literature abstracts in the field of electric power were labeled with named entities, and the corpus data of named entity labeling in the field of electric power was generated. In order to improve the representation ability and semantic understanding ability of the model, Transformer encoder mechanism was introduced in BiLSTM-CRF model. In order to improve the adaptability of the model in the electric power vertical field, the knowledge graph between electric power research keywords and words was constructed, and the neighbor matrix of each word fusing the neighbor information was obtained based on this graph. After that, the neighbor information vectors fusing keyword and word knowledge graph entities were obtained. By constructing a dual-branch word embedding vector input layer, it was possible to obtain word embedding vectors containing contextual information and comprehensive keyword neighbor information. Experimental results show that the proposed model has good recognition effect in the electric power domain.

Key words: knowledge extraction, electrical engineering, Transformer, multi-head attention mechanism, Named Entity Recognition (NER)

中图分类号: