《计算机应用》唯一官方网站

• •    下一篇

动态靶向解毒的后门模型净化方法

程欣铭1,黄荣1,刘浩2,蒋学芹3   

  1. 1. 东华大学
    2. 东华大学信息科学与技术学院
    3. Donghua university
  • 收稿日期:2025-06-23 修回日期:2025-09-17 发布日期:2025-10-15 出版日期:2025-10-15
  • 通讯作者: 黄荣
  • 基金资助:
    国家自然科学基金;中央高校基本科研业务费专项资金

Dynamic targeted recovery method for backdoor model purification

  • Received:2025-06-23 Revised:2025-09-17 Online:2025-10-15 Published:2025-10-15
  • Supported by:
    National Natural Science Foundation of China;Fundamental Research Funds for the Central Universities

摘要: 摘 要:深度神经网络(DNN)的后门攻击威胁严重破坏模型决策的可信性,而现有防御方法依赖一次性剪枝或全局微调,易导致模型良性精度下降。针对此问题,提出一种动态靶向解毒的后门模型净化方法。首先,利用前置激活刻画神经元行为,定位神经元行为异常的中毒神经元。在模型净化时靶向解毒,仅微调中毒神经元,避免在净化中引入对干净神经元的扰动,更好地维持模型良性精度。其次,在模型净化过程中通过监控神经元行为,获取神经元对净化的反馈,动态定位中毒神经元。在此过程中,引入禁忌搜索策略排除对净化贡献微小的顽固神经元的干扰,加快模型净化收敛速度。在3个基础数据集上针对BadNets(Backdoored Neural Network)等6种后门攻击,所提方法将攻击成功率(Attack Success Rate, ASR)降至平均0.21%,同时良性精度(Accuracy, ACC)提高0.1~2.9个百分点,优于ABL(Anti-Backdoor Learning)等其他5种防御方法。动态靶向解毒的模型净化方法有效解决了传统方法因一次性剪枝或全局微调导致的模型良性精度下降问题,为提升DNN安全性提供了更可靠的解决方案。

关键词: 后门模型净化, 靶向解毒, 动态定位, 前置激活, 禁忌搜索

Abstract: Abstract: Backdoor attacks on Deep Neural Networks (DNN) severely compromise the trustworthiness of model decisions. Existing defense methods relying on one-time pruning or global fine-tuning often lead to significant degradation in clean accuracy. To address this, a dynamic targeted recovery method for backdoor model purification was proposed. First, pre-activation was utilized to characterize neuron behavior, enabling the localization of poisoned neurons with abnormal activity. During model purification, targeted recovery was implemented by fine-tuning only the located poisoned neurons, avoiding disturbances to clean neurons and maintaining clean accuracy effectively. Second, allowing dynamic localization of poisoned neurons, neuron behavior was monitored to obtain feedback on the purification process. A tabu search strategy was introduced to exclude interference from stubborn neurons with minimal contribution to purification, accelerating convergence. Experiments on 3 benchmark datasets against 6 backdoor attacks including BadNets (Backdoored Neural Network) showed that the proposed method reduced the average ASR (attack success rate) to 0.21% and improved ACC (accuracy) by 0.1–2.9 percentage points, outperforming 5 other defense methods such as ABL (Anti-Backdoor Learning). The dynamic targeted recovery strategy effectively overcomes the clean accuracy degradation caused by traditional one-time pruning or global fine-tuning, providing a more reliable solution for enhancing DNN security.

Key words: Keywords: backdoor model purification, targeted recovery, dynamic localization, pre-activation, tabu search

中图分类号: