Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (12): 3829-3838.DOI: 10.11772/j.issn.1001-9081.2024121797

• Artificial intelligence • Previous Articles     Next Articles

Metaphor detection model based on linguistic multi-incongruity

Tianlong ZHENG1,2,3, Rui DONG1,2,3, Yating YANG1,2,3, Bo MA1,2,3, Lei WANG1,2,3, Xi ZHOU1,2,3   

  1. 1.Xinjiang Technical Institute of Physics and Chemistry,Chinese Academy of Sciences,Urumqi Xinjiang 830011,China
    2.University of Chinese Academy of Sciences,Beijing 100049,China
    3.Xinjiang Laboratory of Minority Speech and Language Information Processing ( Chinese Academy of Sciences),Urumqi Xinjiang 830011,China
  • Received:2024-12-23 Revised:2025-03-06 Accepted:2025-03-13 Online:2025-03-24 Published:2025-12-10
  • Contact: Rui DONG
  • About author:ZHENG Tianlong, born in 2002, M. S. candidate. His research interests include metaphor detection.
    DONG Rui, born in 1985, Ph. D., research fellow. His research interests include natural language processing, metaphor detection.
    YANG Yating, born in 1985, Ph. D., research fellow. Her research interests include multilingual intelligent information processing.
    MA Bo, born in 1984, Ph. D., research fellow. His research interests include multilingual intelligent information processing.
    WANG Lei, born in 1974, Ph. D., research fellow. His research interests include multilingual intelligent information processing.
    ZHOU Xi, born in 1978, Ph. D., research fellow. His research interests include big data analysis.
  • Supported by:
    Key Project of Xinjiang Uygur Autonomous Region Natural Science Foundation(2024D01D29);Xinjiang Uygur Autonomous Region “Tianshan Talents” Training Program(2022TSYCCX0059);Project of Youth Innovation Promotion Association, Chinese Academy of Sciences(Y2021112);Xinjiang Uygur Autonomous Region Key Research and Development Program(2023B03024)

基于语言学多重不一致性的隐喻检测模型

郑天龙1,2,3, 董瑞1,2,3, 杨雅婷1,2,3, 马博1,2,3, 王磊1,2,3, 周喜1,2,3   

  1. 1.中国科学院 新疆理化技术研究所,乌鲁木齐 830011
    2.中国科学院大学,北京 100049
    3.新疆民族语音语言信息处理实验室(中国科学院),乌鲁木齐 830011
  • 通讯作者: 董瑞
  • 作者简介:郑天龙(2002—),男,安徽阜阳人,硕士研究生,主要研究方向:隐喻检测
    董瑞(1985—),男,山东威海人,研究员,博士,CCF高级会员,主要研究方向:自然语言处理、隐喻检测
    杨雅婷(1985—),女,新疆奇台人,研究员,博士,CCF高级会员,主要研究方向:多语言智能信息处理
    马博(1984—),男,辽宁鞍山人,研究员,博士,CCF高级会员,主要研究方向:多语言智能信息处理
    王磊(1974—),男,河南南阳人,研究员,博士,主要研究方向:多语言智能信息处理
    周喜(1978—),男,湖南双峰人,研究员,博士,CCF高级会员,主要研究方向:大数据分析。
  • 基金资助:
    新疆维吾尔自治区自然科学基金重点项目(2024D01D29);新疆维吾尔自治区自然科学基金重点项目(2022D01D04);新疆维吾尔自治区自然科学基金重点项目(2022D01D81);新疆维吾尔自治区“天山英才”培养计划项目(2022TSYCCX0059);新疆维吾尔自治区“天山英才”培养计划项目(2023TSYCCX0044);新疆维吾尔自治区“天山英才”培养计划项目(2023TSYCCX0041);中国科学院青年创新促进会项目(Y2021112);中国科学院青年创新促进会项目(Y2023118);中国科学院青年创新促进会项目(2021436);新疆维吾尔自治区重点研发计划项目(2023B03024)

Abstract:

A metaphor detection model based on linguistic multi-incongruity was proposed to tackle the metaphor occurrence problem caused by the incongruity between the target sentence meaning and the core meaning of the target word in a specific context where a target word has multiple semantic meanings (polysemy), which is ignored by the existing metaphor detection research. Firstly, in the feature encoding module, two separate encoders were employed to encode the feature information such as the target sentence meaning, the core meaning of the target word, and its contextual meaning. Then, in the multi-incongruity modeling module, three linguistic methods — Selectional Preference Violation (SPV), Metaphor Identification Procedure (MIP), and Semantics Usage Comparison (SUC) — were utilized to conduct unified modeling of incongruity features. Finally, metaphor detection was performed through a metaphor identification module. Furthermore, to validate Chinese metaphor detection performance, a Chinese word-level metaphor detection dataset named META-ZH was constructed through a data annotation method of combining LoRA (Low-Rank Adaptation) fine-tuned Large Language Model (LLM) with manual correction. Experimental results show that the proposed model achieves F1 values improvement of 0.8, 1.3, 1.5, and 2.3 percentage points, respectively, compared to the optimal baseline model on the VUA All, VUA Verb, MOH-X, and META-ZH metaphor detection datasets. It can be seen that the proposed model enhances performance in metaphor detection by fully utilizing linguistic multi-incongruity.

Key words: metaphor detection, Multi-incongruity Network (MulNet), linguistic method, Large Language Model (LLM), LoRA (Low-Rank Adaptation) fine-tuning

摘要:

针对现有隐喻检测研究忽略了目标词在特定语境中存在多种语义(一词多义)时目标语句句义和目标词基本义不一致引起的隐喻发生问题,提出一种基于语言学多重不一致性的隐喻检测模型。首先,在特征编码模块,使用2个独立的编码器编码目标语句句义、目标词基本义和语境义等特征信息;其次,在多重不一致性建模模块,使用选择偏好违背(SPV)、隐喻识别程序(MIP)和语义用法对比(SUC)这3个语言学方法对多重不一致性特征进行统一建模;最后,利用隐喻识别模块进行隐喻检测。此外,通过LoRA(Low-Rank Adaptation)微调的大语言模型(LLM)和人工矫正结合的数据标注方法构建一个中文词级隐喻检测数据集META-ZH,以验证中文隐喻检测性能。实验结果表明,所提模型在VUA All、VUA Verb、MOH-X和META-ZH隐喻检测数据集上,对比最优基线模型,F1值分别提升了0.8、1.3、1.5和2.3个百分点。可见,该模型能够充分利用语言学多重不一致性有效提高隐喻检测性能。

关键词: 隐喻检测, 多重不一致性网络, 语言学方法, 大语言模型, LoRA微调

CLC Number: