《计算机应用》唯一官方网站 ›› 2025, Vol. 45 ›› Issue (12): 3796-3803.DOI: 10.11772/j.issn.1001-9081.2024111681
孟佳娜, 白晨皓, 赵迪, 王博林, 高临霖
收稿日期:2024-12-02
修回日期:2025-03-24
接受日期:2025-04-01
发布日期:2025-04-08
出版日期:2025-12-10
通讯作者:
赵迪
作者简介:孟佳娜(1972—),女,吉林四平人,教授,博士,CCF会员,主要研究方向:机器学习、文本挖掘基金资助:Jiana MENG, Chenhao BAI, Di ZHAO, Bolin WANG, Linlin GAO
Received:2024-12-02
Revised:2025-03-24
Accepted:2025-04-01
Online:2025-04-08
Published:2025-12-10
Contact:
Di ZHAO
About author:MENG Jiana, born in 1972, Ph. D., professor. Her research interests include machine learning, text mining.Supported by:摘要:
多模态命名实体识别(MNER)任务旨在从文本和图像的联合数据中识别出具有特定意义的实体;然而,当前的方法在处理数据偏差和模态差距这2个问题时存在不足。数据偏差会导致有害的偏差误导注意力模块关注训练数据中的虚假相关性,从而损害模型的泛化能力;模态差距则会阻碍文本和图像之间建立正确的语义对齐,从而影响模型的性能。为了解决这2个问题,提出一种因果干预下的多模态命名实体识别(CMNER)方法。该方法利用因果干预理论,在文本模态中使用后门干预处理可观测到的混杂因素,在图像模态使用前门因果干预处理不可直接观测到的混杂因素,以此减轻数据偏差带来的有害影响;同时,结合互信息(MI)相关理论,拉近文本和图像之间的语义“距离”。在多模态领域中验证所提方法的实体识别效果,在数据集Twitter-2015和Twitter-2017上的实验结果表明,CMNER方法的F1分数分别达到了76.00%和88.60%,与次优方法相比分别提高了0.58和0.53个百分点,达到最优水平。可见,CMNER方法可以有效缓解数据偏差和缩小模态差距,进而提升MNER任务的性能。
中图分类号:
孟佳娜, 白晨皓, 赵迪, 王博林, 高临霖. 因果干预下的多模态命名实体识别[J]. 计算机应用, 2025, 45(12): 3796-3803.
Jiana MENG, Chenhao BAI, Di ZHAO, Bolin WANG, Linlin GAO. Multimodal named entity recognition under causal intervention[J]. Journal of Computer Applications, 2025, 45(12): 3796-3803.
| 实体类别 | Twitter-2015 | Twitter-2017 | ||||
|---|---|---|---|---|---|---|
| 训练集 | 验证集 | 测试集 | 训练集 | 验证集 | 测试集 | |
| PER | 2 217 | 552 | 1 816 | 2 943 | 626 | 621 |
| LOC | 2 091 | 522 | 1 697 | 731 | 173 | 178 |
| ORG | 928 | 247 | 839 | 1 674 | 375 | 395 |
| MISC | 940 | 225 | 726 | 701 | 150 | 157 |
| 总计 | 6 176 | 1 546 | 5 078 | 6 049 | 1 324 | 1 351 |
| 数据量 | 4 000 | 1 000 | 3 257 | 3 373 | 723 | 723 |
表1 Twitter-2015和Twitter-2017数据集的分布
Tab. 1 Distribution of Twitter-2015 and Twitter-2017 datasets
| 实体类别 | Twitter-2015 | Twitter-2017 | ||||
|---|---|---|---|---|---|---|
| 训练集 | 验证集 | 测试集 | 训练集 | 验证集 | 测试集 | |
| PER | 2 217 | 552 | 1 816 | 2 943 | 626 | 621 |
| LOC | 2 091 | 522 | 1 697 | 731 | 173 | 178 |
| ORG | 928 | 247 | 839 | 1 674 | 375 | 395 |
| MISC | 940 | 225 | 726 | 701 | 150 | 157 |
| 总计 | 6 176 | 1 546 | 5 078 | 6 049 | 1 324 | 1 351 |
| 数据量 | 4 000 | 1 000 | 3 257 | 3 373 | 723 | 723 |
数据 类型 | 方法 | Twitter-2015 | Twitter-2017 | ||||
|---|---|---|---|---|---|---|---|
| P | R | F1 | P | R | F1 | ||
| 文本 | BiLSTM-CRF | 68.14 | 61.09 | 64.42 | 79.42 | 73.43 | 76.31 |
CNN-BiLSTM- CRF | 66.24 | 68.09 | 67.15 | 80.00 | 78.76 | 79.37 | |
| HBiLSTM-CRF | 70.32 | 68.05 | 69.17 | 82.69 | 78.16 | 80.37 | |
| BERT | 68.30 | 74.61 | 71.32 | 82.19 | 83.72 | 82.95 | |
| BERT-CRF | 69.22 | 74.59 | 71.81 | 83.32 | 83.57 | 83.44 | |
文本 + 图像 | UMT | 71.67 | 75.23 | 73.41 | 85.28 | 85.34 | 85.31 |
| MRC-MNER | 78.10 | 71.45 | 74.63 | 88.78 | 85.00 | 86.85 | |
| CAT-MNER | 76.19 | 74.65 | 75.41 | 87.04 | 84.97 | 85.99 | |
| HVPNeT | 73.87 | 76.82 | 75.32 | 85.84 | 87.93 | 86.87 | |
| MAFN | 71.99 | 75.19 | 73.56 | 85.66 | 85.79 | 85.72 | |
| DebiasCL | 74.49 | 76.13 | 75.28 | 87.59 | 86.11 | 86.84 | |
| DGCF | 74.76 | 75.50 | 75.13 | 88.50 | 87.65 | 88.07 | |
| ICKA | 72.36 | 78.75 | 75.42 | 85.13 | 89.19 | 87.12 | |
| AMLR | 74.96 | 75.21 | 75.09 | 85.75 | 87.27 | 86.50 | |
| 本文方法 | 74.49 | 77.57 | 76.00 | 88.62 | 88.58 | 88.60 | |
表2 MNER方法在2个数据集上的实验结果 (%)
Tab.2 Experimental results of MNER methods on two datasets
数据 类型 | 方法 | Twitter-2015 | Twitter-2017 | ||||
|---|---|---|---|---|---|---|---|
| P | R | F1 | P | R | F1 | ||
| 文本 | BiLSTM-CRF | 68.14 | 61.09 | 64.42 | 79.42 | 73.43 | 76.31 |
CNN-BiLSTM- CRF | 66.24 | 68.09 | 67.15 | 80.00 | 78.76 | 79.37 | |
| HBiLSTM-CRF | 70.32 | 68.05 | 69.17 | 82.69 | 78.16 | 80.37 | |
| BERT | 68.30 | 74.61 | 71.32 | 82.19 | 83.72 | 82.95 | |
| BERT-CRF | 69.22 | 74.59 | 71.81 | 83.32 | 83.57 | 83.44 | |
文本 + 图像 | UMT | 71.67 | 75.23 | 73.41 | 85.28 | 85.34 | 85.31 |
| MRC-MNER | 78.10 | 71.45 | 74.63 | 88.78 | 85.00 | 86.85 | |
| CAT-MNER | 76.19 | 74.65 | 75.41 | 87.04 | 84.97 | 85.99 | |
| HVPNeT | 73.87 | 76.82 | 75.32 | 85.84 | 87.93 | 86.87 | |
| MAFN | 71.99 | 75.19 | 73.56 | 85.66 | 85.79 | 85.72 | |
| DebiasCL | 74.49 | 76.13 | 75.28 | 87.59 | 86.11 | 86.84 | |
| DGCF | 74.76 | 75.50 | 75.13 | 88.50 | 87.65 | 88.07 | |
| ICKA | 72.36 | 78.75 | 75.42 | 85.13 | 89.19 | 87.12 | |
| AMLR | 74.96 | 75.21 | 75.09 | 85.75 | 87.27 | 86.50 | |
| 本文方法 | 74.49 | 77.57 | 76.00 | 88.62 | 88.58 | 88.60 | |
消融 实验 | Twitter-2015 | Twitter-2017 | ||||||
|---|---|---|---|---|---|---|---|---|
| P | R | F1 | ΔF1 | P | R | F1 | ΔF1 | |
| CMNER | 74.49 | 77.57 | 76.00 | — | 88.62 | 88.58 | 88.60 | — |
| w/o 因果 | 73.28 | 75.56 | 74.40 | -1.60 | 86.63 | 87.79 | 87.21 | -1.39 |
| w/o MI | 72.90 | 76.59 | 74.70 | -1.30 | 87.38 | 88.09 | 87.73 | -0.87 |
w/o 因果&MI | 72.51 | 75.85 | 74.14 | -1.86 | 86.90 | 86.10 | 86.50 | -2.10 |
表3 消融实验结果 (%)
Tab. 3 Results of ablation experiments
消融 实验 | Twitter-2015 | Twitter-2017 | ||||||
|---|---|---|---|---|---|---|---|---|
| P | R | F1 | ΔF1 | P | R | F1 | ΔF1 | |
| CMNER | 74.49 | 77.57 | 76.00 | — | 88.62 | 88.58 | 88.60 | — |
| w/o 因果 | 73.28 | 75.56 | 74.40 | -1.60 | 86.63 | 87.79 | 87.21 | -1.39 |
| w/o MI | 72.90 | 76.59 | 74.70 | -1.30 | 87.38 | 88.09 | 87.73 | -0.87 |
w/o 因果&MI | 72.51 | 75.85 | 74.14 | -1.86 | 86.90 | 86.10 | 86.50 | -2.10 |
| 方法 | 示例A | 示例B | 示例C |
|---|---|---|---|
![]() | ![]() | ![]() | |
[HARRY STYLES—PER] WITH SHORT HAIR SPOTTED | Thanks@ [newbalance—ORG] for these colorful shirts for our recreational soccer program. we love them! | Handsome [Rob—MISC] after a fish dinner | |
| UMT | Hillary Clinton — PER √ | newbalance — PER × | Rob — MISC × |
| HVPNeT | Hillary Clinton — PER √ | newbalance — PER × | Rob — PER √ |
| ICKA | Hillary Clinton — PER √ | newbalance — PER √ | Rob — MISC × |
| 本文方法 | Hillary Clinton—PER √ | newbalance — ORG √ | Rob — MISC √ |
表4 HVPNeT、UMT和CMNER方法在3个测试样本上的预测结果
Tab. 4 Prediction results of HVPNeT, UMT, and CMNER methods on three test samples
| 方法 | 示例A | 示例B | 示例C |
|---|---|---|---|
![]() | ![]() | ![]() | |
[HARRY STYLES—PER] WITH SHORT HAIR SPOTTED | Thanks@ [newbalance—ORG] for these colorful shirts for our recreational soccer program. we love them! | Handsome [Rob—MISC] after a fish dinner | |
| UMT | Hillary Clinton — PER √ | newbalance — PER × | Rob — MISC × |
| HVPNeT | Hillary Clinton — PER √ | newbalance — PER × | Rob — PER √ |
| ICKA | Hillary Clinton — PER √ | newbalance — PER √ | Rob — MISC × |
| 本文方法 | Hillary Clinton—PER √ | newbalance — ORG √ | Rob — MISC √ |
| [1] | 韩普,陈文祺.多模态命名实体识别研究进展[J]. 数据分析与知识发现, 2024, 8(4): 50-63. |
| HAN P, CHEN W Q. Review of multimodal named entity recognition studies [J]. Data Analysis and Knowledge Discovery, 2024, 8(4): 50-63. | |
| [2] | 赵丹丹,黄德根,孟佳娜,等. 多头注意力与字词融合的中文命名实体识别[J]. 计算机工程与应用, 2022, 58(7): 142-149. |
| ZHAO D D, HUANG D G, MENG J N, et al. Chinese named entity recognition by integrating multi-heads attention mechanism and character and words fusion[J]. Computer Engineering and Applications, 2022, 58(7): 142-149. | |
| [3] | MOON S, NEVES L, CARVALHO V. Multimodal named entity recognition for short social media posts[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Stroudsburg: ACL, 2018: 852-860. |
| [4] | YU J, JIANG J, YANG L, et al. Improving multimodal named entity recognition via entity span detection with unified multimodal Transformer [C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2020: 3342-3352. |
| [5] | XU B, HUANG S, SHA C, et al. MAF: a general matching and alignment framework for multimodal named entity recognition [C]// Proceedings of the 15th ACM International Conference on Web Search and Data Mining. New York: ACM, 2022: 1215-1223. |
| [6] | ZHAO F, LI C, WU Z, et al. Learning from different text-image pairs: a relation-enhanced graph convolutional network for multimodal NER [C]// Proceedings of the 30th ACM International Conference on Multimedia. New York: ACM, 2022: 3983-3992. |
| [7] | JIA M, SHEN X, SHEN L, et al. Query prior matters: a MRC framework for multimodal named entity recognition[C]// Proceedings of the 30th ACM International Conference on Multimedia. New York: ACM, 2022: 3549-3558. |
| [8] | 王海荣,王彤,徐玺,等. CLGLF:置信学习引导标签融合的多模态命名实体识别方法[J]. 电子学报, 2024, 52(7): 2429-2437. |
| WANG H R, WANG T, XU X, et al. CLGLF: confidence learning guides label fusion for multimodal named entity recognition method [J]. Acta Electronica Sinica, 2024, 52(7): 2429-2437. | |
| [9] | CHEN X, ZHANG N, LI L, et al. Good visual guidance make a better extractor: hierarchical visual prefix for multimodal entity and relation extraction[C]// Findings of the Association for Computational Linguistics: NAACL 2022. Stroudsburg: ACL, 2022: 1607-1618. |
| [10] | WANG X, GUI M, JIANG Y, et al. ITA: image-text alignments for multi-modal named entity recognition[C]// Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: ACL, 2022: 3176-3189. |
| [11] | 于碧辉,谭淑月,魏靖烜,等. 基于对比学习的视觉增强多模态命名实体识别[J]. 计算机科学, 2024, 51(6): 198-205. |
| YU B H, TAN S Y, WEI J X, et al. Vision-enhanced multimodal named entity recognition based on contrastive learning[J]. Computer Science, 2024, 51(6): 198-205. | |
| [12] | 李华昱,张智康,闫阳,等. 基于知识图谱增强的领域多模态实体识别[J]. 计算机工程, 2024, 50(8): 31-39. |
| LI H Y, ZHANG Z K, YAN Y, et al. Enhanced domain multi-modal entity recognition based on knowledge graph[J]. Computer Engineering, 2024, 50(8): 31-39. | |
| [13] | NAN G, ZENG J, QIAO R, et al. Uncovering main causalities for long-tailed information extraction[C]// Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2021: 9683-9695. |
| [14] | REED W J. The Pareto, Zipf and other power laws[J]. Economics Letters, 2001, 74(1): 15-19. |
| [15] | ZHOU X, ZHANG Y, WANG Z, et al. MAFN: multi-level attention fusion network for multimodal named entity recognition[J]. Multimedia Tools and Applications, 2024, 83(15): 45047-45058. |
| [16] | WANG X, YE J, LI Z, et al. CAT-MNER: multimodal named entity recognition with knowledge-refined cross-modal attention[C]// Proceedings of the 2022 IEEE International Conference on Multimedia and Expo. Piscataway: IEEE, 2022: 1-6. |
| [17] | 张鑫,袁景凌,李琳,等. 基于去偏对比学习的多模态命名实体识别[J]. 中文信息学报, 2023, 37(11): 49-59. |
| ZHANG X, YUAN J L, LI L, et al. Debiased contrastive learning for multimodal named entity recognition[J]. Journal of Chinese Information Processing, 2023, 37(11): 49-59. | |
| [18] | MAI W, ZHANG Z, LI K, et al. Dynamic graph construction framework for multimodal named entity recognition in social media[J]. IEEE Transactions on Computational Social Systems, 2024, 11(2): 2513-2522. |
| [19] | ZENG Q, YUAN M, WAN J, et al. ICKA: an instruction construction and knowledge alignment framework for multimodal named entity recognition[J]. Expert Systems with Applications, 2024, 255(Pt D): No.124867. |
| [20] | LI E, LI T, LUO H, et al. Adaptive multi-scale language reinforcement for multimodal named entity recognition[J]. IEEE Transactions on Multimedia, 2025, 27: 5312-5323. |
| [21] | NIU Y, TANG K, ZHANG H, et al. Counterfactual VQA: a cause-effect look at language bias[C]// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 12700-12710. |
| [22] | LIU R, LIU H, LI G, et al. Contextual debiasing for visual recognition with causal mechanisms[C]// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 12745-12755. |
| [23] | LIU Y, LI G, LIN L. Cross-modal causal relational reasoning for event-level visual question answering[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(10): 11624-11641. |
| [24] | ZHANG W, LIN H, HAN X, et al. De-biasing distantly supervised named entity recognition via causal intervention[C]// Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Stroudsburg: ACL, 2021: 4803-4813. |
| [25] | WU X, LUU A T, DONG X. Mitigating data sparsity for short text topic modeling by topic-semantic contrastive learning[C]// Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2022: 2748-2760. |
| [26] | QI P, QIN B. SSMI: semantic similarity and mutual information maximization based enhancement for Chinese NER[C]// Proceedings of the 37th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2023: 13474-13482. |
| [27] | LIU X, YIN D, FENG Y, et al. Everything has a cause: leveraging causal inference in legal text analysis [C]// Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: ACL, 2021: 1928-1941. |
| [28] | DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding [C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Stroudsburg: ACL, 2019: 4171-4186. |
| [29] | HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition [C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 770-778. |
| [30] | YANG X, ZHANG H, QI G, et al. Causal attention for vision-language tasks[C]// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 9842-9852. |
| [31] | ZHANG Q, FU J, LIU X, et al. Adaptive co-attention network for named entity recognition in tweets [C]// Proceedings of the 32nd AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2018: 5674-5681. |
| [32] | LU D, NEVES L, CARVALHO V, et al. Visual attention model for name tagging in multimodal social media [C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg: ACL, 2018: 1990-1999. |
| [33] | MA X, HOVY E. End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF [C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg: ACL, 2016: 1064-1074. |
| [34] | LAMPL G, BALLESTEROS M, SUBRAMANIAN S, et al. Neural architectures for named entity recognition [C]// Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: ACL, 2016: 260-270. |
| [1] | 李昕, 刘雯, 廖集秀, 杨宗驰. 面向机器理解的可视化交互信息重构方法[J]. 《计算机应用》唯一官方网站, 2025, 45(6): 1748-1755. |
| [2] | 罗蒙, 高超, 王震. 基于带约束谱聚类的启发式车辆路径规划算法优化方法[J]. 《计算机应用》唯一官方网站, 2025, 45(5): 1387-1394. |
| [3] | 黄朋, 林佳瑜, 梁祖红. 基于互信息和提示学习的中文无监督对比学习方法[J]. 《计算机应用》唯一官方网站, 2025, 45(10): 3101-3110. |
| [4] | 李顺勇, 李师毅, 胥瑞, 赵兴旺. 基于自注意力融合的不完整多视图聚类算法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2696-2703. |
| [5] | 陈都, 李圆媛, 陈彧. 基于t检验和逐步网络搜索的有向基因调控网络推断算法[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 199-205. |
| [6] | 李瀚臣, 张顺香, 朱广丽, 王腾科. 基于拼音相似度的中文谐音新词发现方法[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2715-2720. |
| [7] | 劳景欢, 黄栋, 王昌栋, 赖剑煌. 基于视图互信息加权的多视图集成聚类算法[J]. 《计算机应用》唯一官方网站, 2023, 43(6): 1713-1718. |
| [8] | 孙林, 黄金旭, 徐久成. 基于邻域容差互信息和鲸鱼优化算法的非平衡数据特征选择[J]. 《计算机应用》唯一官方网站, 2023, 43(6): 1842-1854. |
| [9] | 夏进, 王正群, 朱世明. 基于时间序列分解的交通流量预测模型[J]. 《计算机应用》唯一官方网站, 2023, 43(4): 1129-1135. |
| [10] | 杨世刚, 刘勇国. 融合语料库特征与图注意力网络的短文本分类方法[J]. 《计算机应用》唯一官方网站, 2022, 42(5): 1324-1329. |
| [11] | 翟东昌, 陈红梅. 基于邻域熵的高光谱波段选择算法[J]. 《计算机应用》唯一官方网站, 2022, 42(2): 485-492. |
| [12] | 陈永波, 李巧勤, 刘勇国. 基于动态相关性的特征选择算法[J]. 《计算机应用》唯一官方网站, 2022, 42(1): 109-114. |
| [13] | 程玉胜, 宋帆, 王一宾, 钱坤. 基于专家特征的条件互信息多标记特征选择算法[J]. 《计算机应用》唯一官方网站, 2020, 40(2): 503-509. |
| [14] | 雍菊亚, 周忠眉. 基于互信息的多级特征选择算法[J]. 计算机应用, 2020, 40(12): 3478-3484. |
| [15] | 王煜, 徐建民. 用于网络新闻热点识别的热点新词发现[J]. 计算机应用, 2020, 40(12): 3513-3519. |
| 阅读次数 | ||||||
|
全文 |
|
|||||
|
摘要 |
|
|||||