面向虚假新闻检测的大语言模型检测器偏好与鲁棒性探索

doi:10.11772/j.issn.1001-9081.2026010128

《计算机应用》唯一官方网站

• • 下一篇

面向虚假新闻检测的大语言模型检测器偏好与鲁棒性探索

何灿源¹,李大洋¹,刘仁阳²,曾雅馨¹,宋冰冰¹,姚绍文³,梁宇³,周维⁴

1. 云南大学
2. 新加坡国立大学
3. 云南大学软件学院
4. 云南大学软件学院,昆明 650091;

收稿日期:2026-02-09 修回日期:2026-04-20 发布日期:2026-05-13 出版日期:2026-05-13
通讯作者: 周维

Exploring large language model detector preference and robustness for fake news detection

Received:2026-02-09 Revised:2026-04-20 Online:2026-05-13 Published:2026-05-13

摘要/Abstract

摘要： 具备强大参数化知识与推理能力的大语言模型(LLM)，在虚假新闻检测领域展现出巨大潜力，但其内部决策过程的鲁棒性仍未被充分探索。观察发现，LLM在检测时会呈现出与人类事实核查员相似的稳定推理轨迹，这一现象被定义为认知路径偏好。首先提出了基于“LLM as a Judge”范式的路径偏好评估框架(Path Preference Evaluation Framework，PPEF)，对该现象进行形式化定义与量化分析。PPEF首先提取检测器的解释性推理依据，将其映射到偏好特征本体中，进而识别出多个LLM共有的路径偏好。为探究此类偏好是否会被恶意传播者利用并转化为检测漏洞，设计了多视角重写方法(Multi-View Rewrite，MVR)，在保留原始新闻语义的前提下，选择性地弱化与LLM偏好匹配的线索。基于多个公开数据集与不同检测模型的实验结果表明：LLM在虚假新闻检测过程中，会显著且持续地依赖少量核心线索；相较于风格重写(Style-Based Rewriting，SBG)、开放生成(Open-ended Generation，OEG)等5种非偏好导向的重写，针对偏好特征定向弱化的重写方法MVR会导致检测器性能出现更严重的下降，其中在Gossipcop数据集上，MVR生成的对抗样本使基础CoT驱动的LLM(Vanilla LLM)的识别率下降了1.8至46.7个百分点，干扰效果显著优于SBG、OEG等基线方法生成的对抗样本。上述结论揭示了一种可被实际利用的LLM漏洞，并为构建偏好感知型、路径多样化的检测框架提供了新的思路。

Abstract: Large Language Model (LLM) equipped with strong parametric knowledge and reasoning capabilities has shown great potential in fake news detection, yet its internal decision-making robustness remains largely unexplored. Observations show that LLM displays stable reasoning trajectories similar to those of human fact-checkers, which is termed cognitive path preference. Path Preference Evaluation Framework (PPEF) based on "LLM as a Judge" paradigm is first proposed to formalize and quantify this phenomenon. PPEF first extracts explanatory reasoning rationales of detectors, maps them to preference feature ontology, and further identifies path preferences shared by multiple LLMs. To further examine whether these preferences can be exploited as vulnerabilities by malicious propagators, Multi-View Rewrite (MVR) strategy is designed that selectively weakens cues aligned with LLM preferences while keeping original news semantics intact. Experiments across multiple public datasets and different detection models show that LLM exhibits clear and consistent dependence on a small set of cues during fake news detection. Compared to five non-preference-oriented rewriting methods including Style-Based Rewriting (SBG) and Open-ended Generation (OEG), MVR targeting preference features leads to more severe performance degradation of detectors. Specifically, on Gossipcop dataset, adversarial samples generated by MVR decrease detection rates of basic CoT-driven LLM (Vanilla LLM) by 1.8 to 46.7 percentage points, demonstrating significantly superior interference effectiveness over baseline methods like SBG and OEG. These results reveal a practically exploitable LLM vulnerability and offer insights for developing preference-aware and path-diversified detection frameworks.

Key words: Multi-View Rewrite (, MVR)

中图分类号:

TP309.2

何灿源李大洋刘仁阳曾雅馨宋冰冰姚绍文梁宇周维. 面向虚假新闻检测的大语言模型检测器偏好与鲁棒性探索[J]. 计算机应用, DOI: 10.11772/j.issn.1001-9081.2026010128.

[1]	冀雨馨邵奇峰谭江海. 基于IBBE的区块链访问控制方案[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[2]	夏彬杰缪祥华刘义良吕艳. 基于聚类评估的联邦学习投毒攻击防御算法[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[3]	佘维, 程孔, 张淑慧, 马佳伟, 齐晨虹, 宰光军. 智能合约辅助下的隐蔽通信模型[J]. 《计算机应用》唯一官方网站, 2026, 46(4): 1182-1190.
[4]	彭海洋, 刘天阳, 计卫星, 刘法旺. 基于混淆的自动驾驶仿真测试场景数据保护方法[J]. 《计算机应用》唯一官方网站, 2026, 46(4): 1104-1114.
[5]	姜志, 陈学斌, 罗长银, 甄子业. 联邦学习中改进Kolmogorov-Arnold网络的混合优化框架[J]. 《计算机应用》唯一官方网站, 2026, 46(4): 1023-1033.
[6]	孙鸣潞梁义怀李锴彬周正春. 高阶近邻约束下的自动编码器侧信道攻击预处理方法[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[7]	马凯光, 陈学斌, 菅银龙, 王柳, 高远. 基于混合序列模型与联邦类平衡算法的网络入侵检测[J]. 《计算机应用》唯一官方网站, 2026, 46(3): 857-866.
[8]	郗恩康, 范菁, 金亚东, 董华, 俞浩, 孙伊航. 联邦学习在隐私安全领域面临的威胁综述[J]. 《计算机应用》唯一官方网站, 2026, 46(3): 798-808.
[9]	平欢, 夏战国, 刘思诚, 刘奇翰, 李春磊. 基于多层联邦学习的终端数据隐私保护方案[J]. 《计算机应用》唯一官方网站, 2026, 46(3): 830-838.
[10]	钟琪, 张淑芬, 张镇博, 菅银龙, 景忠瑞. 面向联邦学习的投毒攻击检测与防御机制[J]. 《计算机应用》唯一官方网站, 2026, 46(2): 445-457.
[11]	杜志强白鑫雨. 无损的NMEA电文安全编码方案[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[12]	杨奇刘璟秦卓薛岗. 发送方完全匿名的隐私保护认证密钥交换协议[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[13]	菅银龙, 陈学斌, 景忠瑞, 钟琪, 张镇博. 联邦学习中基于条件生成对抗网络的数据增强方案[J]. 《计算机应用》唯一官方网站, 2026, 46(1): 21-32.
[14]	王晶萍刘艳婷龙海明张英俊. 基于多目标规划和证据推理理论的信息安全风险评估[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[15]	景忠瑞, 陈学斌, 菅银龙, 钟琪, 张镇博. 基于个性化子模型和K均值聚类的联邦学习公平性算法[J]. 《计算机应用》唯一官方网站, 2025, 45(12): 3747-3756.

面向虚假新闻检测的大语言模型检测器偏好与鲁棒性探索

Exploring large language model detector preference and robustness for fake news detection

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics