《计算机应用》唯一官方网站 ›› 0, Vol. ›› Issue (): 61-65.DOI: 10.11772/j.issn.1001-9081.2023111687

• 人工智能 • 上一篇    下一篇

基于自注意力和SoftLexicon的化工安全事故命名实体识别

高扬, 李晗(), 仲伟和   

  1. 辽宁工业大学 电子与信息工程学院,辽宁 锦州 121001
  • 收稿日期:2023-12-13 修回日期:2024-08-20 接受日期:2024-08-22 发布日期:2025-01-24 出版日期:2024-12-31
  • 通讯作者: 李晗
  • 作者简介:高扬(1998—),男,山东青岛人,硕士研究生,主要研究方向:复杂网络、嵌入式系统
    李晗(1984—),男,辽宁兴城人,副教授,博士,主要研究方向:知识图谱、复杂网络
    仲伟和(1974—),男,辽宁庄河人,副教授,硕士,主要研究方向:知识补全技术。
  • 基金资助:
    辽宁省揭榜挂帅科技攻关专项(2022JH1/10400009)

Chemical safety accident named entity recognition based on self-attention and SoftLexicon

Yang GAO, Han LI(), Weihe ZHONG   

  1. School of Electronics and Information Engineering,Liaoning University of Technology,Jinzhou Liaoning 121001,China
  • Received:2023-12-13 Revised:2024-08-20 Accepted:2024-08-22 Online:2025-01-24 Published:2024-12-31
  • Contact: Han LI

摘要:

针对化工安全问题,化工安全事故命名实体识别(NER)可以有效识别事故发生地点、何种化学品参与事故以及致使此事故发生的人员等信息。然而,化工安全事故领域NER中的信息多样且不能充分利用词信息。为解决这些问题,提出使用SoftLexicon的lattice长短时记忆(LSTM)网络结合自注意力机制的融合网络模型。首先,将输入句子扩展到字符级别,并在引入词信息的过程中结合外部词典资源构造字符特征;其次,将字符特征放进引入自注意力机制的Lattice-LSTM-CRF(Conditional Random Field)层,并结合预训练模型BERT(Bidirectional Encoder Representations from Transformers)进行实体识别。实验结果表明,所提模型在化工数据集上的F1分数达到88.61%,在微博和简历公开数据集上的指标值优于字级BiLSTM-CRF、Lattice-LSTM等主流模型。可见,所提模型可以有效完成化工安全事故领域的NER任务。

关键词: 命名实体识别, SoftLexicon, 自注意力机制, 化工安全, 词信息

Abstract:

For chemical safety issues, Named Entity Recognition (NER) for chemical safety accidents can identify information effectively such as location of the accident, type of chemicals involved in the accident, and people responsible for the accident. In order to solve the problems of diversified information and inability to fully utilize word information in NER in the field of chemical safety accidents, a fusion network model of SoftLexicon used lattice Long Short-Term Memory (LSTM) network combined with self-attention mechanism was proposed. Firstly, the input sentence was extended to character level, and in the process of introducing word information, external lexicon resources were combined to construct character features. Then, the character features were introduced into the Lattice-LSTM-CRF (Conditional Random Field) layer of self-attention mechanism and combined with the pre-trained model BERT (Bidirectional Encoder Representations from Transformers) for entity recognition. Experimental results prove that the F1 score of the proposed model in chemical industry dataset reaches 88.61%, and the model' indicator values on public datasets such as Weibo and resume are better than those of the mainstream models such as word-level BiLSTM-CRF and Lattice-LSTM. It can be seen that the proposed model can complete NER tasks in the field of chemical safety accidents effectively.

Key words: Named Entity Recognition (NER), SoftLexicon, self-attention mechanism, chemical safety, word information

中图分类号: