Twice attention mechanism distantly supervised relation extraction based on BERT

doi:10.11772/j.issn.1001-9081.2023040490

Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (4): 1080-1085.DOI: 10.11772/j.issn.1001-9081.2023040490

Special Issue: 人工智能

• Artificial intelligence • Previous Articles Next Articles

Twice attention mechanism distantly supervised relation extraction based on BERT

Quan YUAN¹^,², Changping CHEN¹^,²(), Ze CHEN¹^,², Linfeng ZHAN¹^,²

^1.School of Communication and Information Engineering，Chongqing University of Posts and Telecommunications，Chongqing 400065，China
^2.Research Center of New Communication Technology Applications，Chongqing University of Posts and Telecommunications，Chongqing 400065，China

Received:2023-05-04 Revised:2023-07-03 Accepted:2023-07-10 Online:2023-12-04 Published:2024-04-10
Contact: Changping CHEN
About author:YUAN Quan， born in 1976， M. S.， senior engineer. His research interests include big data， natural language processing.
CHEN Changping， born in 1997， M. S. candidate. His research interests include natural language processing.
CHEN Ze， born in 1999， M. S. candidate. His research interests include image processing.
ZHAN Linfeng， born in 1996， M. S. candidate. His research interests include recommendation algorithms.

基于BERT的两次注意力机制远程监督关系抽取

袁泉¹^,², 陈昌平¹^,²(), 陈泽¹^,², 詹林峰¹^,²

^1.重庆邮电大学通信与信息工程学院，重庆 400065
^2.重庆邮电大学通信新技术应用研究中心，重庆 400065

通讯作者: 陈昌平
作者简介:袁泉（1976—），男，湖南邵阳人，正高级工程师，硕士，主要研究方向：大数据、自然语言处理
陈昌平（1997—），男，重庆人，硕士研究生，主要研究方向：自然语言处理 2501357195@qq.com
陈泽（1999—），男，湖北武汉人，硕士研究生，主要研究方向：图像处理
詹林峰（1996—），男，安徽六安人，硕士研究生，主要研究方向：推荐算法。

Abstract

Abstract:

Aiming at the problem of incomplete semantic information of word vectors and the problem of word polysemy faced by text feature extraction， a BERT （Bidirectional Encoder Representation from Transformer） word vector-based Twice Attention mechanism weighting algorithm for Relation Extraction （TARE） was proposed. Firstly， in the word embedding stage， the self-attention dynamic encoding algorithm was used to capture the semantic information before and after the text for the current word vector by constructing Q， K and V matrices. Then， after the model output the sentence-level feature vector， the locator was used to extract the corresponding parameters of the fully connected layer to construct the relation attention matrix. Finally， the sentence level attention mechanism algorithm was used to add different attention scores to sentence-level feature vectors to improve the noise immunity of sentence-level features. The experimental results show that compared with Contrastive Instance Learning （CIL） algorithm for relation extraction， the F1 value is increased by 4.0 percentage points and the average value of Precision@100， Precision@200， and Precision@300 （P@M） is increased by 11.3 percentage points on the NYT-10m dataset. Compared with the Piecewise Convolutional Neural Network algorithm based on ATTention mechanism （PCNN-ATT）， the AUC （Area Under precision-recall Curve） value is increased by 4.8 percentage points and the P@M value is increased by 2.1 percentage points on the NYT-10d dataset. In various mainstream Distantly Supervised for Relation Extraction （DSRE） tasks， TARE effectively improves the model’s ability to learn data features.

Key words: distant supervision, relation extraction, attention mechanism, word embedding feature, fully connected layer

摘要：

针对词向量语义信息不完整以及文本特征抽取时的一词多义问题，提出基于BERT（Bidirectional Encoder Representation from Transformer）的两次注意力加权算法（TARE）。首先，在词向量编码阶段，通过构建 Q 、 K 、 V 矩阵使用自注意力机制动态编码算法，为当前词的词向量捕获文本前后词语义信息；其次，在模型输出句子级特征向量后，利用定位信息符提取全连接层对应参数，构建关系注意力矩阵；最后，运用句子级注意力机制算法为每个句子级特征向量添加不同的注意力分数，提高句子级特征的抗噪能力。实验结果表明：在NYT-10m数据集上，与基于对比学习框架的CIL（Contrastive Instance Learning）算法相比，TARE的F1值提升了4.0个百分点，按置信度降序排列后前100、200和300条数据精准率Precision@N的平均值（P@M）提升了11.3个百分点；在NYT-10d数据集上，与基于注意力机制的PCNN-ATT（Piecewise Convolutional Neural Network algorithm based on ATTention mechanism）算法相比，精准率与召回率曲线下的面积（AUC）提升了4.8个百分点，P@M值提升了2.1个百分点。在主流的远程监督关系抽取（DSER）任务中，TARE有效地提升了模型对数据特征的学习能力。

关键词: 远程监督, 关系抽取, 注意力机制, 词向量特征, 全连接层

CLC Number:

TP391.1

Quan YUAN, Changping CHEN, Ze CHEN, Linfeng ZHAN. Twice attention mechanism distantly supervised relation extraction based on BERT[J]. Journal of Computer Applications, 2024, 44(4): 1080-1085.

袁泉, 陈昌平, 陈泽, 詹林峰. 基于BERT的两次注意力机制远程监督关系抽取[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1080-1085.

Figures/Tables 6

References 25

1	ALT C， GABRYSZAK A， HENNIG L. Probing linguistic features of sentence-level representations in neural relation extraction［C］// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg： ACL， 2020：1534-1545. 10.18653/v1/2020.acl-main.140
2	TANG Y， HUANG J， WANG G， et al. Orthogonal relation transforms with graph context modeling for knowledge graph embedding［C］// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg： ACL， 2020：2713-2722. 10.18653/v1/2020.acl-main.241
3	WANG C， JIANG H. Explicit utilization of general knowledge in machine reading comprehension［C］// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg： ACL， 2019：2263-2272. 10.18653/v1/p19-1219
4	SAXENA A， TRIPATHI A， TALUKDAR P. Improving multi-hop question answering over knowledge graphs using knowledge base embeddings［C］// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg： ACL， 2020：4498-4507. 10.18653/v1/2020.acl-main.412
5	WEI Z， SU J， WANG Y， et al. A novel cascade binary tagging framework for relational triple extraction［C］// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg： ACL， 2020：1476-1488. 10.18653/v1/2020.acl-main.136
6	ALT C， HÜBER M， HENNIG L. Fine-tuning pre-trained transformer language models to distantly supervised relation extraction［C］// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg： ACL， 2019：1388-1398. 10.18653/v1/p19-1134
7	VEYSEH A P B， DERNONCOURT F， DOU D J， et al. Exploiting the syntax-model consistency for neural relation extraction［C］// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg： ACL， 2020：8021-8032. 10.18653/v1/2020.acl-main.715
8	MINTZ M， BILLS S， SNOW R， et al. Distant supervision for relation extraction without labeled data［C］// Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. Stroudsburg： ACL， 2009：1003-1011. 10.3115/1690219.1690287
9	ZENG D， LIU K， LAI S， et al. Relation classification via convolutional deep neural network［C］// Proceedings of COLING 2014， the 25th International Conference on Computational Linguistics： Technical Papers. Stroudsburg： ACL， 2014：2335-2344.
10	QIN P， XU W， WANG W Y. DSGAN： generative adversarial training for distant supervision relation extraction［C］// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics （Volume 1： Long Papers）. Stroudsburg： ACL， 2018：496-505. 10.18653/v1/p18-1046
11	HOFFMANN R， ZHANG C， LING X， et al. Knowledge-based weak supervision for information extraction of overlapping relations［C］// Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics： Human Language Technologies， Stroudsburg： ACL， 2011：541-550.
12	RITTER A， ZETTLEMOYER L， MAUSAM， et al. Modeling missing data in distant supervision for information extraction［J］. Transactions of the Association for Computational Linguistics， 2013， 1：367-378. 10.1162/tacl_a_00234
13	SURDEANU M， TIBSHIRANI J， NALLAPATI R， et al. Multi-instance multi-label learning for relation extraction［C］// Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Stroudsburg： ACL， 2012：455-465.
14	ZENG D， LIU K， CHEN Y， et al. Distant supervision for relation extraction via piecewise convolutional neural networks［C］// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Stroudsburg： ACL， 2015：1753-1762. 10.18653/v1/d15-1203
15	VASHISHTH S， JOSHI R， PRAYAGA S S， et al. RESIDE： improving distantly-supervised neural relation extraction using side information［C］// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg： ACL， 2018：1257-1266. 10.18653/v1/d18-1157
16	CHEN T， SHI H， TANG S， et al. CIL： contrastive instance learning framework for distantly supervised relation extraction［C］// Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. Stroudsburg： ACL， 2021：6191-6200. 10.18653/v1/2021.acl-long.483
17	王佳宇，李楹，马春梅，等. 融合实体信息的图卷积神经网络的短文本分类模型［J］.天津师范大学学报（自然科学版），2023，43（1）：67-72.
	WANG J Y， LI Y， MA C M， et al. Short text classification based on graph convolutional neural networks with entity information［J］. Journal Tianjin Normal University （Natural Science Edition）， 2023，43（1）：67-72.
18	唐焕玲，卫红敏，王育林，等. 结合LDA与Word2vec的文本语义增强方法［J］.计算机工程与应用，2022，58（13）：135-145. 10.3778/j.issn.1002-8331.2112-0491
	TANG H L， WEI H M， WANG Y L， et al. Text semantic enhancement method combining LDA and Word2vec［J］. Computer Engineering and Applications， 2022，58（13）：135-145. 10.3778/j.issn.1002-8331.2112-0491
19	ZHOU P， SHI W， TIAN J， et al. Attention-based bidirectional long short-term memory networks for relation classification［C］// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics （Volume 2： Short Papers）. Stroudsburg： ACL， 2016：207-212. 10.18653/v1/p16-2034
20	YE Z-X， LING Z-H. Distant supervision relation extraction with intra-bag and inter-bag attentions［C］// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies， Volume 1 （Long and Short Papers）. Stroudsburg： ACL， 2019：2810-2819. 10.18653/v1/n19-1288
21	LI D， ZHANG T， HU N， et al. HiCLRE： a hierarchical contrastive learning framework for distantly supervised relation extraction［C］// Findings of the Association for Computational Linguistics： ACL 2022. Stroudsburg： ACL， 2022：2567-2578. 10.18653/v1/2022.findings-acl.202
22	RATHORE V， BADOLA K， SINGLA P， et al. PARE： a simple and strong baseline for monolingual and multilingual distantly supervised relation extraction［C］// Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics （Volume 2： Short Papers）. Stroudsburg： ACL， 2022：340-354. 10.18653/v1/2022.acl-short.38
23	RIEDEL S， YAO L， McCALLUM A. Modeling relations and their mentions without labeled text［C］// Proceedings of the 2010 Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Cham：Springer，2010：148-163. 10.1007/978-3-642-15939-8_10
24	BHARTIYA A， BADOLA K， MAUSAM. DiS-ReX： a multilingual dataset for distantly supervised relation extraction［C］// Proceeding of the 60th Annual Meeting of the Association for Computational Linguistics（Volume 2： Short Papers）. Stroudsburg： ACL， 2022：849-863. 10.18653/v1/2022.acl-short.95
25	LIN Y， SHEN S， LIU Z， et al. Neural relation extraction with selective attention over instances［C］// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics （Volume 1： Long Papers）. Stroudsburg： ACL， 2016：2124-2133. 10.18653/v1/p16-1200

数据集	关系种类数	样本总数	测试集样本总数	测试集
NYT-10d	58	694 000	172 000	Distant sup
NYT-10m	25	474 000	9 740	Manual

数据集	关系种类数	样本总数	测试集样本总数	测试集
NYT-10d	58	694 000	172 000	Distant sup
NYT-10m	25	474 000	9 740	Manual

参数名	符号	参数值
词向量维度	Embedding_dim	768
学习率	Lr	10^-5，2×10^-5
句子最大长度	Max_length	512
批处理数	Batch_size	16，32，64
Dropout	Dropout	0.5

参数名	符号	参数值
词向量维度	Embedding_dim	768
学习率	Lr	10^-5，2×10^-5
句子最大长度	Max_length	512
批处理数	Batch_size	16，32，64
Dropout	Dropout	0.5

方法	AUC	P@M
文献［8］方法	10.7	49.2
PCNN-ATT	34.1	69.4
TARE	38.9	71.5

Twice attention mechanism distantly supervised relation extraction based on BERT

基于BERT的两次注意力机制远程监督关系抽取

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 6

References 25

Related Articles 15

Recommended Articles

Metrics

方法	AUC	F1	P@M
PCNN-ATT	41.9	32.0	68.6
DISTRE	35.7	31.4	65.1
CIL	56.0	34.3	75.9
TARE	54.1	38.3	87.2

模型	NYT-10m			NYT-10d
模型	AUC	F1	P@M	AUC	P@M
TARE	54.1	38.4	87.3	38.9	71.5
No Sentence-attention	53.0	35.3	86.2	37.3	70.2
No self-attention	51.3	32.4	85.3	34.7	69.4

[1]	Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974.
[2]	Liting LI, Bei HUA, Ruozhou HE, Kuang XU. Multivariate time series prediction model based on decoupled attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2732-2738.
[3]	Zhiqiang ZHAO, Peihong MA, Xinhong HEI. Crowd counting method based on dual attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2886-2892.
[4]	Yubo ZHAO, Liping ZHANG, Sheng YAN, Min HOU, Mao GAO. Relation extraction between discipline knowledge entities based on improved piecewise convolutional neural network and knowledge distillation [J]. Journal of Computer Applications, 2024, 44(8): 2421-2429.
[5]	Kaipeng XUE, Tao XU, Chunjie LIAO. Multimodal sentiment analysis network with self-supervision and multi-layer cross attention [J]. Journal of Computer Applications, 2024, 44(8): 2387-2392.
[6]	Pengqi GAO, Heming HUANG, Yonghong FAN. Fusion of coordinate and multi-head attention mechanisms for interactive speech emotion recognition [J]. Journal of Computer Applications, 2024, 44(8): 2400-2406.
[7]	Zhonghua LI, Yunqi BAI, Xuejin WANG, Leilei HUANG, Chujun LIN, Shiyu LIAO. Low illumination face detection based on image enhancement [J]. Journal of Computer Applications, 2024, 44(8): 2588-2594.
[8]	Shangbin MO, Wenjun WANG, Ling DONG, Shengxiang GAO, Zhengtao YU. Single-channel speech enhancement based on multi-channel information aggregation and collaborative decoding [J]. Journal of Computer Applications, 2024, 44(8): 2611-2617.
[9]	Yuan TANG, Yanping CHEN, Ying HU, Ruizhang HUANG, Yongbin QIN. Relation extraction model based on multi-scale hybrid attention convolutional neural networks [J]. Journal of Computer Applications, 2024, 44(7): 2011-2017.
[10]	Wu XIONG, Congjun CAO, Xuefang SONG, Yunlong SHAO, Xusheng WANG. Handwriting identification method based on multi-scale mixed domain attention mechanism [J]. Journal of Computer Applications, 2024, 44(7): 2225-2232.
[11]	Huanhuan LI, Tianqiang HUANG, Xuemei DING, Haifeng LUO, Liqing HUANG. Public traffic demand prediction based on multi-scale spatial-temporal graph convolutional network [J]. Journal of Computer Applications, 2024, 44(7): 2065-2072.
[12]	Dianhui MAO, Xuebo LI, Junling LIU, Denghui ZHANG, Wenjing YAN. Chinese entity and relation extraction model based on parallel heterogeneous graph and sequential attention mechanism [J]. Journal of Computer Applications, 2024, 44(7): 2018-2025.
[13]	Li LIU, Haijin HOU, Anhong WANG, Tao ZHANG. Generative data hiding algorithm based on multi-scale attention [J]. Journal of Computer Applications, 2024, 44(7): 2102-2109.
[14]	Song XU, Wenbo ZHANG, Yifan WANG. Lightweight video salient object detection network based on spatiotemporal information [J]. Journal of Computer Applications, 2024, 44(7): 2192-2199.
[15]	Dahai LI, Zhonghua WANG, Zhendong WANG. Dual-branch low-light image enhancement network combining spatial and frequency domain information [J]. Journal of Computer Applications, 2024, 44(7): 2175-2182.