Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (12): 3796-3803.DOI: 10.11772/j.issn.1001-9081.2024111681
• Artificial intelligence • Previous Articles Next Articles
Jiana MENG, Chenhao BAI, Di ZHAO, Bolin WANG, Linlin GAO
Received:2024-12-02
Revised:2025-03-24
Accepted:2025-04-01
Online:2025-04-08
Published:2025-12-10
Contact:
Di ZHAO
About author:MENG Jiana, born in 1972, Ph. D., professor. Her research interests include machine learning, text mining.Supported by:孟佳娜, 白晨皓, 赵迪, 王博林, 高临霖
通讯作者:
赵迪
作者简介:孟佳娜(1972—),女,吉林四平人,教授,博士,CCF会员,主要研究方向:机器学习、文本挖掘基金资助:CLC Number:
Jiana MENG, Chenhao BAI, Di ZHAO, Bolin WANG, Linlin GAO. Multimodal named entity recognition under causal intervention[J]. Journal of Computer Applications, 2025, 45(12): 3796-3803.
孟佳娜, 白晨皓, 赵迪, 王博林, 高临霖. 因果干预下的多模态命名实体识别[J]. 《计算机应用》唯一官方网站, 2025, 45(12): 3796-3803.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2024111681
| 实体类别 | Twitter-2015 | Twitter-2017 | ||||
|---|---|---|---|---|---|---|
| 训练集 | 验证集 | 测试集 | 训练集 | 验证集 | 测试集 | |
| PER | 2 217 | 552 | 1 816 | 2 943 | 626 | 621 |
| LOC | 2 091 | 522 | 1 697 | 731 | 173 | 178 |
| ORG | 928 | 247 | 839 | 1 674 | 375 | 395 |
| MISC | 940 | 225 | 726 | 701 | 150 | 157 |
| 总计 | 6 176 | 1 546 | 5 078 | 6 049 | 1 324 | 1 351 |
| 数据量 | 4 000 | 1 000 | 3 257 | 3 373 | 723 | 723 |
Tab. 1 Distribution of Twitter-2015 and Twitter-2017 datasets
| 实体类别 | Twitter-2015 | Twitter-2017 | ||||
|---|---|---|---|---|---|---|
| 训练集 | 验证集 | 测试集 | 训练集 | 验证集 | 测试集 | |
| PER | 2 217 | 552 | 1 816 | 2 943 | 626 | 621 |
| LOC | 2 091 | 522 | 1 697 | 731 | 173 | 178 |
| ORG | 928 | 247 | 839 | 1 674 | 375 | 395 |
| MISC | 940 | 225 | 726 | 701 | 150 | 157 |
| 总计 | 6 176 | 1 546 | 5 078 | 6 049 | 1 324 | 1 351 |
| 数据量 | 4 000 | 1 000 | 3 257 | 3 373 | 723 | 723 |
数据 类型 | 方法 | Twitter-2015 | Twitter-2017 | ||||
|---|---|---|---|---|---|---|---|
| P | R | F1 | P | R | F1 | ||
| 文本 | BiLSTM-CRF | 68.14 | 61.09 | 64.42 | 79.42 | 73.43 | 76.31 |
CNN-BiLSTM- CRF | 66.24 | 68.09 | 67.15 | 80.00 | 78.76 | 79.37 | |
| HBiLSTM-CRF | 70.32 | 68.05 | 69.17 | 82.69 | 78.16 | 80.37 | |
| BERT | 68.30 | 74.61 | 71.32 | 82.19 | 83.72 | 82.95 | |
| BERT-CRF | 69.22 | 74.59 | 71.81 | 83.32 | 83.57 | 83.44 | |
文本 + 图像 | UMT | 71.67 | 75.23 | 73.41 | 85.28 | 85.34 | 85.31 |
| MRC-MNER | 78.10 | 71.45 | 74.63 | 88.78 | 85.00 | 86.85 | |
| CAT-MNER | 76.19 | 74.65 | 75.41 | 87.04 | 84.97 | 85.99 | |
| HVPNeT | 73.87 | 76.82 | 75.32 | 85.84 | 87.93 | 86.87 | |
| MAFN | 71.99 | 75.19 | 73.56 | 85.66 | 85.79 | 85.72 | |
| DebiasCL | 74.49 | 76.13 | 75.28 | 87.59 | 86.11 | 86.84 | |
| DGCF | 74.76 | 75.50 | 75.13 | 88.50 | 87.65 | 88.07 | |
| ICKA | 72.36 | 78.75 | 75.42 | 85.13 | 89.19 | 87.12 | |
| AMLR | 74.96 | 75.21 | 75.09 | 85.75 | 87.27 | 86.50 | |
| 本文方法 | 74.49 | 77.57 | 76.00 | 88.62 | 88.58 | 88.60 | |
Tab.2 Experimental results of MNER methods on two datasets
数据 类型 | 方法 | Twitter-2015 | Twitter-2017 | ||||
|---|---|---|---|---|---|---|---|
| P | R | F1 | P | R | F1 | ||
| 文本 | BiLSTM-CRF | 68.14 | 61.09 | 64.42 | 79.42 | 73.43 | 76.31 |
CNN-BiLSTM- CRF | 66.24 | 68.09 | 67.15 | 80.00 | 78.76 | 79.37 | |
| HBiLSTM-CRF | 70.32 | 68.05 | 69.17 | 82.69 | 78.16 | 80.37 | |
| BERT | 68.30 | 74.61 | 71.32 | 82.19 | 83.72 | 82.95 | |
| BERT-CRF | 69.22 | 74.59 | 71.81 | 83.32 | 83.57 | 83.44 | |
文本 + 图像 | UMT | 71.67 | 75.23 | 73.41 | 85.28 | 85.34 | 85.31 |
| MRC-MNER | 78.10 | 71.45 | 74.63 | 88.78 | 85.00 | 86.85 | |
| CAT-MNER | 76.19 | 74.65 | 75.41 | 87.04 | 84.97 | 85.99 | |
| HVPNeT | 73.87 | 76.82 | 75.32 | 85.84 | 87.93 | 86.87 | |
| MAFN | 71.99 | 75.19 | 73.56 | 85.66 | 85.79 | 85.72 | |
| DebiasCL | 74.49 | 76.13 | 75.28 | 87.59 | 86.11 | 86.84 | |
| DGCF | 74.76 | 75.50 | 75.13 | 88.50 | 87.65 | 88.07 | |
| ICKA | 72.36 | 78.75 | 75.42 | 85.13 | 89.19 | 87.12 | |
| AMLR | 74.96 | 75.21 | 75.09 | 85.75 | 87.27 | 86.50 | |
| 本文方法 | 74.49 | 77.57 | 76.00 | 88.62 | 88.58 | 88.60 | |
消融 实验 | Twitter-2015 | Twitter-2017 | ||||||
|---|---|---|---|---|---|---|---|---|
| P | R | F1 | ΔF1 | P | R | F1 | ΔF1 | |
| CMNER | 74.49 | 77.57 | 76.00 | — | 88.62 | 88.58 | 88.60 | — |
| w/o 因果 | 73.28 | 75.56 | 74.40 | -1.60 | 86.63 | 87.79 | 87.21 | -1.39 |
| w/o MI | 72.90 | 76.59 | 74.70 | -1.30 | 87.38 | 88.09 | 87.73 | -0.87 |
w/o 因果&MI | 72.51 | 75.85 | 74.14 | -1.86 | 86.90 | 86.10 | 86.50 | -2.10 |
Tab. 3 Results of ablation experiments
消融 实验 | Twitter-2015 | Twitter-2017 | ||||||
|---|---|---|---|---|---|---|---|---|
| P | R | F1 | ΔF1 | P | R | F1 | ΔF1 | |
| CMNER | 74.49 | 77.57 | 76.00 | — | 88.62 | 88.58 | 88.60 | — |
| w/o 因果 | 73.28 | 75.56 | 74.40 | -1.60 | 86.63 | 87.79 | 87.21 | -1.39 |
| w/o MI | 72.90 | 76.59 | 74.70 | -1.30 | 87.38 | 88.09 | 87.73 | -0.87 |
w/o 因果&MI | 72.51 | 75.85 | 74.14 | -1.86 | 86.90 | 86.10 | 86.50 | -2.10 |
| 方法 | 示例A | 示例B | 示例C |
|---|---|---|---|
![]() | ![]() | ![]() | |
[HARRY STYLES—PER] WITH SHORT HAIR SPOTTED | Thanks@ [newbalance—ORG] for these colorful shirts for our recreational soccer program. we love them! | Handsome [Rob—MISC] after a fish dinner | |
| UMT | Hillary Clinton — PER √ | newbalance — PER × | Rob — MISC × |
| HVPNeT | Hillary Clinton — PER √ | newbalance — PER × | Rob — PER √ |
| ICKA | Hillary Clinton — PER √ | newbalance — PER √ | Rob — MISC × |
| 本文方法 | Hillary Clinton—PER √ | newbalance — ORG √ | Rob — MISC √ |
Tab. 4 Prediction results of HVPNeT, UMT, and CMNER methods on three test samples
| 方法 | 示例A | 示例B | 示例C |
|---|---|---|---|
![]() | ![]() | ![]() | |
[HARRY STYLES—PER] WITH SHORT HAIR SPOTTED | Thanks@ [newbalance—ORG] for these colorful shirts for our recreational soccer program. we love them! | Handsome [Rob—MISC] after a fish dinner | |
| UMT | Hillary Clinton — PER √ | newbalance — PER × | Rob — MISC × |
| HVPNeT | Hillary Clinton — PER √ | newbalance — PER × | Rob — PER √ |
| ICKA | Hillary Clinton — PER √ | newbalance — PER √ | Rob — MISC × |
| 本文方法 | Hillary Clinton—PER √ | newbalance — ORG √ | Rob — MISC √ |
| [1] | 韩普,陈文祺.多模态命名实体识别研究进展[J]. 数据分析与知识发现, 2024, 8(4): 50-63. |
| HAN P, CHEN W Q. Review of multimodal named entity recognition studies [J]. Data Analysis and Knowledge Discovery, 2024, 8(4): 50-63. | |
| [2] | 赵丹丹,黄德根,孟佳娜,等. 多头注意力与字词融合的中文命名实体识别[J]. 计算机工程与应用, 2022, 58(7): 142-149. |
| ZHAO D D, HUANG D G, MENG J N, et al. Chinese named entity recognition by integrating multi-heads attention mechanism and character and words fusion[J]. Computer Engineering and Applications, 2022, 58(7): 142-149. | |
| [3] | MOON S, NEVES L, CARVALHO V. Multimodal named entity recognition for short social media posts[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Stroudsburg: ACL, 2018: 852-860. |
| [4] | YU J, JIANG J, YANG L, et al. Improving multimodal named entity recognition via entity span detection with unified multimodal Transformer [C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2020: 3342-3352. |
| [5] | XU B, HUANG S, SHA C, et al. MAF: a general matching and alignment framework for multimodal named entity recognition [C]// Proceedings of the 15th ACM International Conference on Web Search and Data Mining. New York: ACM, 2022: 1215-1223. |
| [6] | ZHAO F, LI C, WU Z, et al. Learning from different text-image pairs: a relation-enhanced graph convolutional network for multimodal NER [C]// Proceedings of the 30th ACM International Conference on Multimedia. New York: ACM, 2022: 3983-3992. |
| [7] | JIA M, SHEN X, SHEN L, et al. Query prior matters: a MRC framework for multimodal named entity recognition[C]// Proceedings of the 30th ACM International Conference on Multimedia. New York: ACM, 2022: 3549-3558. |
| [8] | 王海荣,王彤,徐玺,等. CLGLF:置信学习引导标签融合的多模态命名实体识别方法[J]. 电子学报, 2024, 52(7): 2429-2437. |
| WANG H R, WANG T, XU X, et al. CLGLF: confidence learning guides label fusion for multimodal named entity recognition method [J]. Acta Electronica Sinica, 2024, 52(7): 2429-2437. | |
| [9] | CHEN X, ZHANG N, LI L, et al. Good visual guidance make a better extractor: hierarchical visual prefix for multimodal entity and relation extraction[C]// Findings of the Association for Computational Linguistics: NAACL 2022. Stroudsburg: ACL, 2022: 1607-1618. |
| [10] | WANG X, GUI M, JIANG Y, et al. ITA: image-text alignments for multi-modal named entity recognition[C]// Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: ACL, 2022: 3176-3189. |
| [11] | 于碧辉,谭淑月,魏靖烜,等. 基于对比学习的视觉增强多模态命名实体识别[J]. 计算机科学, 2024, 51(6): 198-205. |
| YU B H, TAN S Y, WEI J X, et al. Vision-enhanced multimodal named entity recognition based on contrastive learning[J]. Computer Science, 2024, 51(6): 198-205. | |
| [12] | 李华昱,张智康,闫阳,等. 基于知识图谱增强的领域多模态实体识别[J]. 计算机工程, 2024, 50(8): 31-39. |
| LI H Y, ZHANG Z K, YAN Y, et al. Enhanced domain multi-modal entity recognition based on knowledge graph[J]. Computer Engineering, 2024, 50(8): 31-39. | |
| [13] | NAN G, ZENG J, QIAO R, et al. Uncovering main causalities for long-tailed information extraction[C]// Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2021: 9683-9695. |
| [14] | REED W J. The Pareto, Zipf and other power laws[J]. Economics Letters, 2001, 74(1): 15-19. |
| [15] | ZHOU X, ZHANG Y, WANG Z, et al. MAFN: multi-level attention fusion network for multimodal named entity recognition[J]. Multimedia Tools and Applications, 2024, 83(15): 45047-45058. |
| [16] | WANG X, YE J, LI Z, et al. CAT-MNER: multimodal named entity recognition with knowledge-refined cross-modal attention[C]// Proceedings of the 2022 IEEE International Conference on Multimedia and Expo. Piscataway: IEEE, 2022: 1-6. |
| [17] | 张鑫,袁景凌,李琳,等. 基于去偏对比学习的多模态命名实体识别[J]. 中文信息学报, 2023, 37(11): 49-59. |
| ZHANG X, YUAN J L, LI L, et al. Debiased contrastive learning for multimodal named entity recognition[J]. Journal of Chinese Information Processing, 2023, 37(11): 49-59. | |
| [18] | MAI W, ZHANG Z, LI K, et al. Dynamic graph construction framework for multimodal named entity recognition in social media[J]. IEEE Transactions on Computational Social Systems, 2024, 11(2): 2513-2522. |
| [19] | ZENG Q, YUAN M, WAN J, et al. ICKA: an instruction construction and knowledge alignment framework for multimodal named entity recognition[J]. Expert Systems with Applications, 2024, 255(Pt D): No.124867. |
| [20] | LI E, LI T, LUO H, et al. Adaptive multi-scale language reinforcement for multimodal named entity recognition[J]. IEEE Transactions on Multimedia, 2025, 27: 5312-5323. |
| [21] | NIU Y, TANG K, ZHANG H, et al. Counterfactual VQA: a cause-effect look at language bias[C]// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 12700-12710. |
| [22] | LIU R, LIU H, LI G, et al. Contextual debiasing for visual recognition with causal mechanisms[C]// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 12745-12755. |
| [23] | LIU Y, LI G, LIN L. Cross-modal causal relational reasoning for event-level visual question answering[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(10): 11624-11641. |
| [24] | ZHANG W, LIN H, HAN X, et al. De-biasing distantly supervised named entity recognition via causal intervention[C]// Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Stroudsburg: ACL, 2021: 4803-4813. |
| [25] | WU X, LUU A T, DONG X. Mitigating data sparsity for short text topic modeling by topic-semantic contrastive learning[C]// Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2022: 2748-2760. |
| [26] | QI P, QIN B. SSMI: semantic similarity and mutual information maximization based enhancement for Chinese NER[C]// Proceedings of the 37th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2023: 13474-13482. |
| [27] | LIU X, YIN D, FENG Y, et al. Everything has a cause: leveraging causal inference in legal text analysis [C]// Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: ACL, 2021: 1928-1941. |
| [28] | DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding [C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Stroudsburg: ACL, 2019: 4171-4186. |
| [29] | HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition [C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 770-778. |
| [30] | YANG X, ZHANG H, QI G, et al. Causal attention for vision-language tasks[C]// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 9842-9852. |
| [31] | ZHANG Q, FU J, LIU X, et al. Adaptive co-attention network for named entity recognition in tweets [C]// Proceedings of the 32nd AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2018: 5674-5681. |
| [32] | LU D, NEVES L, CARVALHO V, et al. Visual attention model for name tagging in multimodal social media [C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg: ACL, 2018: 1990-1999. |
| [33] | MA X, HOVY E. End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF [C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg: ACL, 2016: 1064-1074. |
| [34] | LAMPL G, BALLESTEROS M, SUBRAMANIAN S, et al. Neural architectures for named entity recognition [C]// Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: ACL, 2016: 260-270. |
| [1] | Peng HUANG, Jiayu LIN, Zuhong LIANG. Unsupervised contrastive learning for Chinese with mutual information and prompt learning [J]. Journal of Computer Applications, 2025, 45(10): 3101-3110. |
| [2] | Jin XIA, Zhengqun WANG, Shiming ZHU. Traffic flow prediction model based on time series decomposition [J]. Journal of Computer Applications, 2023, 43(4): 1129-1135. |
| [3] | Dongchang ZHAI, Hongmei CHEN. Hyperspectral band selection algorithm based on neighborhood entropy [J]. Journal of Computer Applications, 2022, 42(2): 485-492. |
| [4] | WEI Jiawang, WANG Xiao, YUAN Yubo. Adaptive window regression method for face feature point positioning [J]. Journal of Computer Applications, 2019, 39(5): 1459-1465. |
| [5] | MAO Yingchi, CAO Hai, PING Ping, LI Xiaofang. Feature selection based on maximum conditional and joint mutual information [J]. Journal of Computer Applications, 2019, 39(3): 734-741. |
| [6] | TANG Xiaochuan, QIU Xiwei, LUO Liang. Interaction based algorithm for feature selection in text categorization [J]. Journal of Computer Applications, 2018, 38(7): 1857-1861. |
| [7] | YAO Rongpeng, XU Guoyan, SONG Jian. Micro-blog new word discovery method based on improved mutual information and branch entropy [J]. Journal of Computer Applications, 2016, 36(10): 2772-2776. |
| [8] | HAN Min, SUN Zhuoran. Epileptic EEG signals classification based on wavelet transform and AdaBoost extreme learning machine [J]. Journal of Computer Applications, 2015, 35(9): 2701-2705. |
| [9] | LAN Xing, WANG Xingliang, LI Wei, WU Haotian, JIANG Mengran. Optimization between multiple input multiple output radar signal and target interference based on Stackelberg game [J]. Journal of Computer Applications, 2015, 35(4): 1185-1189. |
| [10] | DING Yaojun CAI Wandong. Internet traffic classification method based on selective clustering ensemble of mutual information [J]. Journal of Computer Applications, 2013, 33(01): 80-82. |
| [11] | ALIMJAN Aysa TURGUN Ibrahim KURBAN Obul LI Zhe. Phrase based Uyghur language text categorization [J]. Journal of Computer Applications, 2012, 32(10): 2923-2926. |
| [12] | DONG Yuan-yuan CHEN Ji-li TANG Xiao-xia. Unsupervised feature selection method based on latent Dirichlet allocation model and mutual information [J]. Journal of Computer Applications, 2012, 32(08): 2250-2257. |
| [13] | LI Jin ZHANG Hua WU Hao-xiong XIANG Jun. BTopicMiner: domain-specific topic mining system for Chinese microblog [J]. Journal of Computer Applications, 2012, 32(08): 2346-2349. |
| [14] | ZHAI Xian-min TIAN Sheng-wei YU Long FENG Guan-jun. Improved suffix tree clustering for Uyghur text [J]. Journal of Computer Applications, 2012, 32(04): 1078-1081. |
| [15] | ZHONG Jun TIAN Sheng-wei YU Long. Automatic identification of Uyghur domain term in Web text [J]. Journal of Computer Applications, 2012, 32(02): 407-410. |
| Viewed | ||||||
|
Full text |
|
|||||
|
Abstract |
|
|||||