《计算机应用》唯一官方网站

• •    下一篇

基于语言学多重不一致性的隐喻检测模型

郑天龙,董瑞,杨雅婷,马博,王磊,周喜   

  1. 中国科学院新疆理化技术研究所
  • 收稿日期:2024-12-23 修回日期:2025-03-06 发布日期:2025-03-24 出版日期:2025-03-24
  • 通讯作者: 董瑞
  • 基金资助:
    新疆维吾尔自治区自然科学基金重点项目;新疆维吾尔自治区自然科学基金重点项目;新疆维吾尔自治区自然科学基金重点项目;新疆维吾尔自治区“天山英 才”培养计划;新疆维吾尔自治区“天山英 才”培养计划;新疆维吾尔自治区“天山英 才”培养计划;中国科学院青年创新促进会项目;中国科学院青年创新促进会项目;中国科学院青年创新促进会项目;新疆维吾尔自治区重点研发计划项目

Metaphor detection model based on linguistic multi-incongruity

  • Received:2024-12-23 Revised:2025-03-06 Online:2025-03-24 Published:2025-03-24
  • Supported by:
    Key Project of Xinjiang Uygur Autonomous Region Natural Science Foundation;Key Project of Xinjiang Uygur Autonomous Region Natural Science Foundation;Key Project of Xinjiang Uygur Autonomous Region Natural Science Foundation;Xinjiang Uygur Autonomous Region "Tianshan Talents" Training Program;Xinjiang Uygur Autonomous Region "Tianshan Talents" Training Program;Xinjiang Uygur Autonomous Region "Tianshan Talents" Training Program;Chinese Academy of Sciences Youth Innovation Promotion Association Program;Chinese Academy of Sciences Youth Innovation Promotion Association Program;Chinese Academy of Sciences Youth Innovation Promotion Association Program;Xinjiang Uygur Autonomous Region Key Research and Development Project

摘要: 针对现有隐喻检测研究忽略了目标词在特定语境中存在多种语义(一词多义)时,目标语句句义和目标词基本义不一致而引起的隐喻发生问题,提出了一种基于语言学多重不一致性的隐喻检测模型。该模型主要包括3个部分:首先在特征编码模块,使用两个独立的编码器编码目标语句句义、目标词基本义和语境义等特征信息;其次在多重不一致性建模模块,使用选择偏好违背(SPV)、隐喻识别程序(MIP)和语义用法对比(SUC)语言学方法对多重不一致性特征进行统一建模;最后利用隐喻识别模块进行检测隐喻。此外,本研究也通过LoRA(Low-Rank Adaptation)微调大语言模型和人工错误矫正相结合的数据标注方法构建了一个中文词级隐喻检测数据集META-ZH,以验证中文隐喻检测性能。结果表明,该模型在VUA-All、VUA-Verb、MOH-X、META-ZH等多个隐喻检测数据集上,对比QMM(Quantum-inspired Match for Metaphor)等基线系统,F1值分别提升了0.8、1.3、1.5、2.3个百分点,能够充分利用语言学多重不一致性有效提高隐喻检测性能。

关键词: 隐喻检测, 多重不一致性网络, 语言学方法, 大语言模型, LoRA(Low-Rank Adaptation)微调

Abstract: A metaphor detection model based on linguistic multi-incongruity was proposed to tackle the problem of metaphor occurrence caused by the incongruity between the target sentence meaning and the basic semantic meaning of the target word in a specific context where a target word has multiple semantic meanings (Polysemy), which has been ignored by the existing metaphor detection research. The model was structured with three main components. First, two separate encoders were employed in the feature encoding module to capture the target sentence meaning, the basic semantics of the target word, and its contextual semantics. Then, three linguistic methods - Selectional Preference Violation (SPV), Metaphor Identification Procedure (MIP), and Semantics Usage Comparison (SUC) - were integrated in the multi-incongruity modeling module to systematically model incongruity features. Finally, metaphor detection was performed through a dedicated recognition module. Furthermore, to validate Chinese metaphor detection performance, a Chinese word-level metaphor detection dataset named META-ZH was constructed through LoRA (Low-Rank Adaptation) fine-tuning of large language models combined with manual error correction in data annotation. The proposed model is demonstrated to achieve significant F1 value improvements of 0.8, 1.3, 1.5, and 2.3 percentage points over baseline systems such as QMM (Quantum-inspired Match for Metaphor) on the VUA-All, VUA-Verb, MOH-X, and META-ZH metaphor detection datasets respectively. Enhanced performance in metaphor detection is attributed to the comprehensive utilization of linguistic multi-incongruity.

Key words: metaphor detection, multi-incongruity network, linguistic methods, large language models, LoRA (Low-Rank Adaptation), fine-tuning

中图分类号: