计算机应用 ›› 2020, Vol. 40 ›› Issue (8): 2182-2188.DOI: 10.11772/j.issn.1001-9081.2019122255

• 人工智能 • 上一篇    下一篇

面向煤矿的实体识别与关系抽取模型

张心怡1,2,3, 冯仕民1,2,3, 丁恩杰1,2,3   

  1. 1. 矿山互联网应用技术国家地方联合工程实验室(中国矿业大学), 江苏 徐州 221008;
    2. 中国矿业大学 信息与控制工程学院, 江苏 徐州 221008;
    3. 中国矿业大学 物联网(感知矿山)研究中心, 江苏 徐州 221008
  • 收稿日期:2020-01-09 修回日期:2020-04-24 出版日期:2020-08-10 发布日期:2020-05-14
  • 通讯作者: 冯仕民(1983-),男,江苏徐州人,研究员,博士,主要研究方向:矿山物联网、多传感器智能信息融合及应用;879151468@qq.com
  • 作者简介:张心怡(1995-),女,陕西西安人,硕士研究生,主要研究方向:矿山物联网、矿工不安全行为识别方法及应用;丁恩杰(1962-),男,江苏徐州人,教授,博士,主要研究方向:矿山物联网。
  • 基金资助:
    国家重点研发计划项目(2017YFC0804401)。

Entity recognition and relation extraction model for coal mine

ZHANG Xinyi1,2,3, FENG Shimin1,2,3, DING Enjie1,2,3   

  1. 1. The National Joint Engineering Laboratory of Internet Applied Technology of Mines(China University of Mining and Technology), Xuzhou Jiangsu 221008, China;
    2. School of Information and Control Engineering, China University of Mining and Technology, Xuzhou Jiangsu 221008, China;
    3. IoT Perception Mine Research Center, China University of Mining and Technology, Xuzhou Jiangsu 221008, China
  • Received:2020-01-09 Revised:2020-04-24 Online:2020-08-10 Published:2020-05-14
  • Supported by:
    This work is partially supported by the National Key Research and Development Program of China (2017YFC0804401).

摘要: 针对煤矿领域知识抽取中存在的术语嵌套、一词多义,抽取任务间存在误差传播等问题,提出了一种深层注意力模型框架。首先,使用标注策略联合学习两项知识抽取子任务,以解决误差传播的问题;其次,提出结合多种词向量信息的投影方法,以缓解煤矿领域术语抽取中的一词多义的问题;然后,设计深度特征提取网络,并提出深层注意力模型及两种模型增强方案来充分提取语义信息;最后,对模型的分类层进行研究,以在保证抽取效果的前提下最大限度地简化模型。实验结果表明,在煤矿领域语料上,相较于编码-解码结构的最好模型,所提模型的F1值有了1.5个百分点的提升,同时模型训练速度几乎提高至原来的3倍。该模型可有效地完成煤矿领域术语抽取以及术语关系抽取这两项知识抽取子任务。

关键词: 命名实体识别, 关系抽取, 联合学习, 注意力机制, 词向量

Abstract: In view of the problems of term nesting, polysemy and error propagation between extraction subtask tasks, a deep attention model framework was proposed. First, the annotation strategy was used to jointly learn two sub tasks of knowledge extraction for solving the problem of error propagation. Second, a projection method combining multiple word vector information was proposed to alleviate the polysemy problem in term extraction in coal mine field. Third, a deep feature extraction network was designed, and a deep attention model and two model enhancement schemes were proposed to fully extract the semantic information. Finally, the classification layer of the model was analyzed to simplify the model to the maximum extent under the premise of ensuring the extraction effect. Experimental results show that, compared with the best model of coding-decoding structure, the proposed model has the F1-score increased by 1.5 percentage points and the model training speed improved by nearly 3 times. The proposed model can effectively complete two knowledge extraction subtasks which are term extraction and term relationship extraction in coal mine field.

Key words: named entity recognition, relation extraction, joint learning, attention mechanism, word vector

中图分类号: