《计算机应用》唯一官方网站

• •    下一篇

面向煤矿机电设备领域的三元组抽取方法

游新冬1,问英姿2,佘鑫鹏2,吕学强3   

  1. 1. 北京信息科技大学网络文化与数字传播北京市重点实验室
    2. 北京信息科技大学 网络文化与数字传播北京市重点实验室
    3. 网络文化与数字传播北京市重点实验室(北京信息科技大学),北京 100101
  • 收稿日期:2023-07-14 修回日期:2023-09-14 发布日期:2023-10-26 出版日期:2023-10-26
  • 通讯作者: 问英姿

Triplet extraction method for mine electromechanical equipment field

  • Received:2023-07-14 Revised:2023-09-14 Online:2023-10-26 Published:2023-10-26

摘要: 针对机电设备相关语料匮乏、关系类型领域特征挖掘不充分以及文本包含重叠三元组的问题,提出了一种融合提示学习与先验知识以迭代式对抗训练的三元组抽取方法。首先,利用BERT模型在自构语料库上进行微调,以获取输入文本的特征向量;接着,采用投影梯度下降(PGD)方法在嵌入层进行迭代式对抗训练,提高模型对干扰样本的抵御能力和对真实样本的泛化能力;然后,利用单层头尾指针网络识别出头实体,并结合提示学习模板获取头实体对应的领域先验特征,将字向量与模板中预测得到的提示向量相结合;最后,在分层标注框架下,使用单层头尾指针网络逐个识别预定义的所有关系类型所对应的尾实体。所提方法与基线模型CasRel相比,在精确率、召回率和F1值上分别提高了3.10、6.12、4.88个百分点。实验结果表明,在煤矿机电设备领域三元组抽取任务中具有一定的优势。

Abstract: To address the challenges of scarce domain-specific corpora for electromechanical equipment, insufficient feature mining for relation types, and the presence of overlapping triplets in texts, a triplet extraction method based on learning with prior knowledge through iterative adversarial training was proposed. Firstly, the BERT model was fine-tuned on a self-constructed corpus to obtain feature vectors for input texts. Then, an iterative adversarial training using the Projection Gradient Descent (PGD) method was conducted at the embedding layer to enhance the model's resistance to perturbed samples and generalization ability to real samples. Furthermore, a single-layer head-tail pointer network was used to identify the head entity, and domain-specific prior features corresponding to the head entity were obtained by incorporating the word vectors with the vectors predicted by the learning template. Finally, within a hierarchical annotation framework, another single-layer head-tail pointer network was employed to sequentially identify the tail entities associated with predefined relation types. In comparison with the baseline model CasRel, the proposed method achieves improvements of 3.10, 6.12, and 4.88 percentage points in precision, recall, and F1 score, respectively. Experimental results demonstrate its advantages in triplet extraction tasks within the domain of coal mining electromechanical equipment.

中图分类号: