《计算机应用》唯一官方网站 ›› 2026, Vol. 46 ›› Issue (3): 821-829.DOI: 10.11772/j.issn.1001-9081.2025030384

• 网络空间安全 • 上一篇    下一篇

基于直接引导扩散模型的对抗净化方法

胡岩1, 李鹏1,2(), 成姝燕1   

  1. 1.南京邮电大学 计算机学院、软件学院、网络空间安全学院,南京 210023
    2.南京邮电大学 网络安全与可信计算研究所,南京 210023
  • 收稿日期:2025-04-15 修回日期:2025-06-04 接受日期:2025-06-05 发布日期:2025-07-01 出版日期:2026-03-10
  • 通讯作者: 李鹏
  • 作者简介:胡岩(2001—),男,江苏泰州人,硕士研究生,主要研究方向:深度学习、对抗攻击与防御
    成姝燕(1999—),女,山西临汾人,博士研究生,主要研究方向:深度学习、对抗攻击与防御。
  • 基金资助:
    国家自然科学基金资助项目(62102194);江苏省“六大人才高峰”高层次人才项目(RJFW?111)

Adversarial purification method based on directly guided diffusion model

Yan HU1, Peng LI1,2(), Shuyan CHENG1   

  1. 1.School of Computer Science,Nanjing University of Posts and Telecommunications,Nanjing Jiangsu 210023,China
    2.Institute of Network Security and Trusted Computing,Nanjing University of Posts and Telecommunications,Nanjing Jiangsu 210023,China
  • Received:2025-04-15 Revised:2025-06-04 Accepted:2025-06-05 Online:2025-07-01 Published:2026-03-10
  • Contact: Peng LI
  • About author:HU Yan, born in 2001, M. S. candidate. His research interests include deep learning, adversarial attack and defense.
    huyan, born in 1999, Ph. D. candidate. Her research interests include deep learning, adversarial attack and defense.
    First author contact:CHENG S
  • Supported by:
    National Natural Science Foundation of China(62102194);“Six Talents Peaks” High-level Talent Project of Jiangsu Province(RJFW-111)

摘要:

深度神经网络(DNN)容易受到对抗扰动的影响,因此攻击者会通过向图像中添加难以察觉的对抗扰动以欺骗DNN。虽然基于扩散模型的对抗净化方法可以使用扩散模型生成干净样本以防御此类攻击,但扩散模型本身也会受到对抗扰动的影响。因此,提出对抗净化方法StraightDiffusion,使用对抗样本直接引导扩散模型的净化过程。首先,探讨现有方法在使用扩散模型进行对抗净化时存在的关键问题与局限性;其次,提出一种新的采样方式在去噪过程中使用两阶段引导方式——头引导和尾引导,即在去噪过程的初期和末期进行引导,其他阶段不使用引导。在CIFAR-10和ImageNet数据集使用3个分类器WideResNet-70-16、WideResNet-28-10和ResNet-50的实验结果表明,StraightDiffusion具有超过基线方法的防御性能,在CIFAR-10和ImageNet数据集上相较于去噪模型用于对抗净化(DiffPure方法)和净化引导扩散模型(GDMP)等方法取得了最好的标准准确率和鲁棒准确率。以上验证了所提方法能够提升净化效果,从而提高分类模型面对对抗样本的鲁棒准确率,实现了多攻击场景下的有效防御。

关键词: 对抗扰动, 对抗净化, 扩散模型, 鲁棒准确率, 神经网络, 引导

Abstract:

Deep Neural Networks (DNNs) are susceptible to adversarial perturbations, so that attackers may deceive DNNs by adding imperceptible adversarial perturbations to the image. The adversarial purification methods based on diffusion model use diffusion models to generate clean samples to defend against such attacks, but the diffusion models themselves are also susceptible to adversarial perturbations. Therefore, an adversarial purification method named StraightDiffusion was proposed, in which the diffusion process of diffusion model was guided by adversarial samples directly. Firstly, key problems and limitations of the existing methods during the used of diffusion models for adversarial purification were discussed. Secondly, a new sampling method was proposed, in which a two-stage guidance approach was used in the denoising process — head guidance and tail guidance, which means guidance was applied only in the early and late stages of denoising process, and not in other stages. Experimental results on the CIFAR-10 and ImageNet datasets using three classifiers: WideResNet-70-16, WideResNet-28-10, and ResNet-50 show that StraightDiffusion outperforms baseline methods in defense performance. Compared to the methods such as diffusion models for adversarial purification (DiffPure method) and Guided Diffusion Model for Purification (GDMP), StraightDiffusion achieves the best standard and robust accuracies on both CIFAR-10 and ImageNet datasets. The above verifies that the proposed method can improve purification performance, thereby enhancing the robust accuracy of classification models against adversarial samples and achieving effective defense under multiple attack scenarios.

Key words: adversarial perturbation, adversarial purification, diffusion model, robust accuracy, neural network, guidance

中图分类号: