Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (2): 393-402.DOI: 10.11772/j.issn.1001-9081.2023020143
Special Issue: 人工智能
• Artificial intelligence • Previous Articles Next Articles
Andi GUO1, Zhen JIA1, Tianrui LI1,2()
Received:
2023-02-20
Revised:
2023-03-21
Accepted:
2023-04-03
Online:
2023-08-14
Published:
2024-02-10
Contact:
Tianrui LI
About author:
GUO Andi, born in 1998, M. S. candidate. His research interests include natural language processing, knowledge graph.Supported by:
通讯作者:
李天瑞
作者简介:
郭安迪(1998—),男,山东菏泽人,硕士研究生,CCF学生会员,主要研究方向:自然语言处理、知识图谱基金资助:
CLC Number:
Andi GUO, Zhen JIA, Tianrui LI. High-precision entity and relation extraction in medical domain based on pseudo-entity data augmentation[J]. Journal of Computer Applications, 2024, 44(2): 393-402.
郭安迪, 贾真, 李天瑞. 基于伪实体数据增强的高精准率医学领域实体关系抽取[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 393-402.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2023020143
数据集类型 | 数据条目 | 关系数 | 主语数 | 宾语数 |
---|---|---|---|---|
训练集 | 14 339 | 43 660 | 18 797 | 41 478 |
测试集 | 3 585 | 10 626 | 4 627 | 10 123 |
增强集 | 14 339 | 450 053 | 26 028 | 113 630 |
Tab. 1 Information of CMeIE dataset used in experiment
数据集类型 | 数据条目 | 关系数 | 主语数 | 宾语数 |
---|---|---|---|---|
训练集 | 14 339 | 43 660 | 18 797 | 41 478 |
测试集 | 3 585 | 10 626 | 4 627 | 10 123 |
增强集 | 14 339 | 450 053 | 26 028 | 113 630 |
模型 | 预热率 | 批大小 | lr/10-5 | 实体抽取 | 关系分类 | |
---|---|---|---|---|---|---|
Epoch | 最大片段长度 | Epoch | ||||
SpERT | 0.1 | 2 | 5 | — | 20 | 20 |
PURE | 0.1 | 32 | 5 | 7 | — | 10 |
PL-Marker | 0.1 | 32 | 5 | 7 | 20 | 8 |
CBLUE | 0.1 | 32 | 5 | 7 | — | 8 |
本文模型 | 0.1 | 32 | 5 | 7 | — | 8 |
Tab. 2 Details of experimental parameters
模型 | 预热率 | 批大小 | lr/10-5 | 实体抽取 | 关系分类 | |
---|---|---|---|---|---|---|
Epoch | 最大片段长度 | Epoch | ||||
SpERT | 0.1 | 2 | 5 | — | 20 | 20 |
PURE | 0.1 | 32 | 5 | 7 | — | 10 |
PL-Marker | 0.1 | 32 | 5 | 7 | 20 | 8 |
CBLUE | 0.1 | 32 | 5 | 7 | — | 8 |
本文模型 | 0.1 | 32 | 5 | 7 | — | 8 |
模型 | 实体抽取 | 关系抽取 | ||||
---|---|---|---|---|---|---|
P | R | F1 | P | R | F1 | |
SpERT | 59.66 | 78.23 | 67.70 | 46.79 | 46.86 | 46.82 |
PURE | 72.94 | 70.48 | 71.69 | 53.87 | 48.70 | 51.16 |
PL-Marker | 75.47 | 70.55 | 72.92 | 57.71 | 50.08 | 53.63 |
CBLUE | 72.76 | 72.10 | 72.43 | 59.65 | 49.23 | 53.94 |
本文模型 | 76.01 | 73.20 | 74.57 | 68.97 | 48.39 | 56.88 |
Tab. 3 Comparison of experimental results among different models
模型 | 实体抽取 | 关系抽取 | ||||
---|---|---|---|---|---|---|
P | R | F1 | P | R | F1 | |
SpERT | 59.66 | 78.23 | 67.70 | 46.79 | 46.86 | 46.82 |
PURE | 72.94 | 70.48 | 71.69 | 53.87 | 48.70 | 51.16 |
PL-Marker | 75.47 | 70.55 | 72.92 | 57.71 | 50.08 | 53.63 |
CBLUE | 72.76 | 72.10 | 72.43 | 59.65 | 49.23 | 53.94 |
本文模型 | 76.01 | 73.20 | 74.57 | 68.97 | 48.39 | 56.88 |
模型 | 关系抽取 | ||
---|---|---|---|
P | R | F1 | |
Ground Truth | 85.69 | 60.08 | 70.64 |
PURE | 67.84 | 44.48 | 53.73 |
PL-Marker | 69.52 | 45.29 | 54.85 |
CBLUE | 69.08 | 47.47 | 56.27 |
TFRM | 69.97 | 48.39 | 56.88 |
Tab. 4 Comparison of experimental results of entity extraction models
模型 | 关系抽取 | ||
---|---|---|---|
P | R | F1 | |
Ground Truth | 85.69 | 60.08 | 70.64 |
PURE | 67.84 | 44.48 | 53.73 |
PL-Marker | 69.52 | 45.29 | 54.85 |
CBLUE | 69.08 | 47.47 | 56.27 |
TFRM | 69.97 | 48.39 | 56.88 |
模型 | 实体抽取 | 实体关系抽取 | ||||
---|---|---|---|---|---|---|
P | R | F1 | P | R | F1 | |
未用TFRU | 71.52 | 72.04 | 71.52 | 69.08 | 47.47 | 56.27 |
1层 TFRU | 76.80 | 71.94 | 74.29 | 70.22 | 46.46 | 55.92 |
2层 TFRU | 76.01 | 73.20 | 74.57 | 69.97 | 48.39 | 56.88 |
3层 TFRU | 76.47 | 72.96 | 74.67 | 69.98 | 48.14 | 56.86 |
Tab. 5 Comparison experiment results of TFRU module parameters
模型 | 实体抽取 | 实体关系抽取 | ||||
---|---|---|---|---|---|---|
P | R | F1 | P | R | F1 | |
未用TFRU | 71.52 | 72.04 | 71.52 | 69.08 | 47.47 | 56.27 |
1层 TFRU | 76.80 | 71.94 | 74.29 | 70.22 | 46.46 | 55.92 |
2层 TFRU | 76.01 | 73.20 | 74.57 | 69.97 | 48.39 | 56.88 |
3层 TFRU | 76.47 | 72.96 | 74.67 | 69.98 | 48.14 | 56.86 |
增强策略 | 实标记 | 悬浮标记 | ||||||
---|---|---|---|---|---|---|---|---|
P/% | R/% | F1/% | 每秒采样数 | P/% | R/% | F1/% | 每秒采样数 | |
59.65 | 49.23 | 53.94 | 173.7 | 57.71 | 50.08 | 53.63 | 729.8 | |
57.71 | 52.01 | 54.71 | 58.86 | 51.86 | 54.90 | |||
57.61 | 52.08 | 54.71 | 57.60 | 52.00 | 54.65 | |||
60.43 | 51.99 | 55.89 | 60.48 | 51.22 | 55.47 | |||
61.97 | 51.13 | 56.03 | 61.94 | 50.69 | 55.75 | |||
62.15 | 50.75 | 55.87 | 62.17 | 50.91 | 55.98 | |||
68.09 | 47.00 | 55.61 | 67.68 | 47.42 | 55.77 | |||
67.85 | 44.97 | 54.09 | 68.97 | 48.39 | 56.88 | |||
72.48 | 45.06 | 55.57 | 71.12 | 46.46 | 56.20 |
Tab. 6 Ablation experimental results of automatic generation module of relation negative examples
增强策略 | 实标记 | 悬浮标记 | ||||||
---|---|---|---|---|---|---|---|---|
P/% | R/% | F1/% | 每秒采样数 | P/% | R/% | F1/% | 每秒采样数 | |
59.65 | 49.23 | 53.94 | 173.7 | 57.71 | 50.08 | 53.63 | 729.8 | |
57.71 | 52.01 | 54.71 | 58.86 | 51.86 | 54.90 | |||
57.61 | 52.08 | 54.71 | 57.60 | 52.00 | 54.65 | |||
60.43 | 51.99 | 55.89 | 60.48 | 51.22 | 55.47 | |||
61.97 | 51.13 | 56.03 | 61.94 | 50.69 | 55.75 | |||
62.15 | 50.75 | 55.87 | 62.17 | 50.91 | 55.98 | |||
68.09 | 47.00 | 55.61 | 67.68 | 47.42 | 55.77 | |||
67.85 | 44.97 | 54.09 | 68.97 | 48.39 | 56.88 | |||
72.48 | 45.06 | 55.57 | 71.12 | 46.46 | 56.20 |
案例 | 类别 | 实体关系样例 |
---|---|---|
案例一 | 标准答案 | [Miller-Fisher综合征] |
未进行 数据增强 | [Miller-Fisher综合征] | |
数据增强 | [Miller-Fisher综合征] | |
案例二 | 标准答案 | [室上速] |
未进行 数据增强 | [室上速] | |
数据增强 | [室上速] |
Tab. 7 Case analysis
案例 | 类别 | 实体关系样例 |
---|---|---|
案例一 | 标准答案 | [Miller-Fisher综合征] |
未进行 数据增强 | [Miller-Fisher综合征] | |
数据增强 | [Miller-Fisher综合征] | |
案例二 | 标准答案 | [室上速] |
未进行 数据增强 | [室上速] | |
数据增强 | [室上速] |
1 | 宁尚明,滕飞,李天瑞.基于多通道自注意力机制的电子病历实体关系抽取[J].计算机学报,2020,43(5): 916-929. 10.11897/SP.J.1016.2020.00916 |
NING S M, TENG F, LI T R. Multi-channel self-attention mechanism for relation extraction in clinical records [J]. Chinese Journal of Computers, 2020, 43(5): 916-929. 10.11897/SP.J.1016.2020.00916 | |
2 | ZHONG Z, CHEN D. A frustratingly easy approach for entity and relation extraction [C]// Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsberg: ACL, 2021: 50-61. 10.18653/v1/2021.naacl-main.5 |
3 | YE D, LIN Y, LI P, et al. Packed levitated marker for entity and relation extraction [C]// Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsberg: ACL, 2022: 4904-4917. 10.18653/v1/2022.acl-long.337 |
4 | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2017: 6000-6010. |
5 | BLASCHKE C, VALENCIA A. The frame-based module of the SUISEKI information extraction system [J]. IEEE Intelligent Systems, 2002, 17(2): 14-20. 10.1109/mis.2002.999215 |
6 | CORNEY D P, JONES D T, BUXTON B F, et al. Extracting biological information from full-length papers:RN/ 03/17 [R/OL].[2022-02-01]. . 10.1093/bioinformatics/bth386 |
7 | FUNDEL K, KÜFFNER R, ZIMMER R. RelEx — Relation extraction using dependency parse trees [J]. Bioinformatics, 2007, 23(3): 365-371. 10.1093/bioinformatics/btl616 |
8 | ZHAO Z, YANG Z, SUN C, et al. A hybrid protein-protein interaction triple extraction method for biomedical literature [C]// Proceedings of the 2017 IEEE International Conference on Bioinformatics and Biomedicine. Piscataway: IEEE, 2017: 1515-1521. 10.1109/bibm.2017.8217886 |
9 | KORDJAMSHIDI P, ROTH D, M-F MOENS. Structured learning for spatial information extraction from biomedical text: bacteria biotopes [J]. BMC Bbioinformatics, 2015, 16: Article No. 129. 10.1186/s12859-015-0542-z |
10 | KORDJAMSHIDI P, VAN OTTERLO M, M-F MOENS. Spatial role labeling: towards extraction of spatial relations from natural language [J]. ACM Transactions on Speech and Language Processing, 2011, 8(3): Article No. 4. 10.1145/2050104.2050105 |
11 | 刘奔,姬东鸿.药物实体和药物相互关系的联合识别 [J].计算机工程与设计,2017,38(5):1377-1381. 10.16208/j.issn1000-7024.2017.05.048 |
LIU B, JI D H. Joint extraction of drug entity and drug-drug interaction[J]. Computer Engineering and Design, 2017, 38(5): 1377-1381. 10.16208/j.issn1000-7024.2017.05.048 | |
12 | LI F, ZHANG M, FU G, et al. A neural joint model for entity and relation extraction from biomedical text [J]. BMC Bioinformatics, 2017, 18: Article No. 198. 10.1186/s12859-017-1609-9 |
13 | BEKOULIS G, DELEU J, DEMEESTER T, et al. Adversarial training for multi-context joint entity and relation extraction [EB/OL]. (2019-01-14) [2022-05-06]. . 10.18653/v1/d18-1307 |
14 | 张世豪,杜圣东,贾真,等.基于深度神经网络和自注意力机制的医学实体关系抽取[J].计算机科学,2021,48(10): 77-84. 10.11896/jsjkx.210300271 |
ZHANG S H, DU S D, JIA Z, et al. Medical entity relation extraction based on deep neural network and self-attention mechanism [J]. Computer Science, 2021, 48(10): 77-84. 10.11896/jsjkx.210300271 | |
15 | DEVLIN J, CHANG M-W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding [EB/OL]. (2019-05-24) [2019-09-01]. . 10.18653/v1/n18-2 |
16 | PETERS M E, NEUMANN M, IYYER M, et al. Deep contextualized word representations[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). Stroudsberg: ACL, 2018: 2227-2237. 10.18653/v1/n18-1202 |
17 | LUO L, YANG Z, CAO M, et al. A neural network-based joint learning approach for biomedical entity and relation extraction from biomedical literature [J]. Journal of Biomedical Informatics, 2020, 103: 103384. 10.1016/j.jbi.2020.103384 |
18 | ZHAO T, YAN Z, CAO Y, et al. Asking effective and diverse questions: a machine reading comprehension based framework for joint entity-relation extraction [C]// Proceedings of the 29th International Joint Conferences on Artificial Intelligence. California: ijcai.org, 2021: 3948-3954. 10.24963/ijcai.2020/546 |
19 | EBERTS M, ULGES A. Span-based joint entity and relation extraction with transformer pre-training[EB/OL]. [2023-02-01]. . 10.18653/v1/2021.eacl-main.319 |
20 | SHEN Y, MA X, TANG Y, et al. A trigger-sense memory flow framework for joint entity and relation extraction [C]// Proceedings of the Web Conference 2021. New York: ACM, 2021: 1704-1715. 10.1145/3442381.3449895 |
21 | FENG S Y, GANGAL V, WEI J, et al. A survey of data augmentation approaches for NLP [C]// Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. Stroudsberg: ACL, 2021: 968-988. 10.18653/v1/2021.findings-acl.84 |
22 | WEI J, ZOU K. EDA: Easy data augmentation techniques for boosting performance on text classification tasks [C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Stroudsberg: ACL, 2019: 6382-6388. 10.18653/v1/d19-1670 |
23 | ABDOLLAHI M, GAO X, MEI Y, et al. Substituting clinical features using synthetic medical phrases: medical text data augmentation techniques [J]. Artificial Intelligence in Medicine, 2021, 120(C): 102167. 10.1016/j.artmed.2021.102167 |
24 | KANG T, PEROTTE A, TANG Y, et al. UMLS-based data augmentation for natural language processing of clinical research literature [J]. Journal of the American Medical Informatics Association, 2021, 28(4): 812-823. 10.1093/jamia/ocaa309 |
25 | SENNRICH R, HADDOW B, BIRCH A. Improving neural machine translation models with monolingual data [C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Stroudsberg: ACL, 2016: 86-96. 10.18653/v1/p16-1009 |
26 | WANG A, LI L, WU X, et al. Entity relation extraction in the medical domain: based on data augmentation [J]. Annals of Translational Medicine, 2022, 10(19): 1061-1073. 10.21037/atm-22-3991 |
27 | KOBAYASHI S. Contextual augmentation: data augmentation by words with paradigmatic relations [C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers). Stroudsberg: ACL, 2018: 452-457. 10.18653/v1/n18-2072 |
28 | YANG Y, MALAVIYA C, FERNANDEZ J, et al. Generative data augmentation for commonsense reasoning [C]// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Stroudsberg: ACL, 2020: 1008-1025. 10.18653/v1/2020.findings-emnlp.90 |
29 | QUTEINEH H, SAMOTHRAKIS S, SUTCLIFFE R. Textual data augmentation for efficient active learning on tiny datasets [C]// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Stroudsberg: ACL, 2020: 7400-7410. 10.18653/v1/2020.emnlp-main.600 |
30 | WOLF T, DEBUT L, SANH V, et al. HuggingFace’s Transformers: state-of-the-art natural language processing [EB/OL].(2020-07-14)[2022-06-03]. . 10.18653/v1/2020.emnlp-demos.6 |
31 | ZHANG N, CHEN M, BI Z, et al. CBLUE: a Chinese biomedical language understanding evaluation benchmark [C]// Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsberg: ACL, 2022: 7888-7915. 10.18653/v1/2022.acl-long.544 |
[1] | Dianhui MAO, Xuebo LI, Junling LIU, Denghui ZHANG, Wenjing YAN. Chinese entity and relation extraction model based on parallel heterogeneous graph and sequential attention mechanism [J]. Journal of Computer Applications, 2024, 44(7): 2018-2025. |
[2] | Jiong WANG, Taotao TANG, Caiyan JIA. PAGCL: positive augmentation graph contrastive learning recommendation method without negative sampling [J]. Journal of Computer Applications, 2024, 44(5): 1485-1492. |
[3] | Jie GUO, Jiayu LIN, Zuhong LIANG, Xiaobo LUO, Haitao SUN. Recommendation method based on knowledge‑awareness and cross-level contrastive learning [J]. Journal of Computer Applications, 2024, 44(4): 1121-1127. |
[4] | Yifei SONG, Yi LIU. Fast adversarial training method based on data augmentation and label noise [J]. Journal of Computer Applications, 2024, 44(12): 3798-3807. |
[5] | Xinrong HU, Jingxue CHEN, Zijian HUANG, Bangchao WANG, Xun YAO, Junping LIU, Qiang ZHU, Jie YANG. Graph convolution network-based masked data augmentation [J]. Journal of Computer Applications, 2024, 44(11): 3335-3344. |
[6] | Qiujie SUN, Jinggui LIANG, Si LI. Chinese grammatical error correction model based on bidirectional and auto-regressive transformers noiser [J]. Journal of Computer Applications, 2022, 42(3): 860-866. |
[7] | Yimin CAO, Lei CAI, Jingyang GAO. Gene data generation method based on generative adversarial network [J]. Journal of Computer Applications, 2022, 42(3): 783-790. |
[8] | Yu PENG, Yaolian SONG, Jun YANG. Motor imagery electroencephalography classification based on data augmentation [J]. Journal of Computer Applications, 2022, 42(11): 3625-3632. |
[9] | Ping LUO, Ling DING, Xue YANG, Yang XIANG. Chinese event detection based on data augmentation and weakly supervised adversarial training [J]. Journal of Computer Applications, 2022, 42(10): 2990-2995. |
[10] | Shuang DENG, Xiaohai HE, Linbo QING, Honggang CHEN, Qizhi TENG. Weakly supervised fine-grained classification method of Alzheimer’s disease based on improved visual geometry group network [J]. Journal of Computer Applications, 2022, 42(1): 302-309. |
[11] | LIU Yaxuan, ZHONG Yong. Joint extraction method of entities and relations based on subject attention [J]. Journal of Computer Applications, 2021, 41(9): 2517-2522. |
[12] | JIA Chengxun, LAI Hua, YU Zhengtao, WEN Yonghua, YU Zhiqiang. Chinese-Vietnamese pseudo-parallel corpus generation based on monolingual language model [J]. Journal of Computer Applications, 2021, 41(6): 1652-1658. |
[13] | LU Xinwei, YU Pengfei, LI Haiyan, LI Hongsong, DING Wenqian. Weakly supervised fine-grained image classification algorithm based on attention-attention bilinear pooling [J]. Journal of Computer Applications, 2021, 41(5): 1319-1325. |
[14] | GAN Lan, SHEN Hongfei, WANG Yao, ZHANG Yuejin. Data augmentation method based on improved deep convolutional generative adversarial networks [J]. Journal of Computer Applications, 2021, 41(5): 1305-1313. |
[15] | HUO Shoujun, HAO Yan, SHI Huiyu, DONG Yanqing, CAO Rui. Pattern recognition of motor imagery EEG based on deep convolutional network [J]. Journal of Computer Applications, 2021, 41(4): 1042-1048. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||