《计算机应用》唯一官方网站 ›› 2024, Vol. 44 ›› Issue (1): 94-100.DOI: 10.11772/j.issn.1001-9081.2023060854

• 人工智能 • 上一篇    下一篇

基于自适应攻击强度的对抗训练方法

陈彤, 位纪伟(), 何仕远, 宋井宽, 杨阳   

  1. 电子科技大学 计算机科学与工程学院,成都 611731
  • 收稿日期:2023-07-01 修回日期:2023-08-24 接受日期:2023-08-28 发布日期:2023-09-14 出版日期:2024-01-10
  • 通讯作者: 位纪伟
  • 作者简介:陈彤(2000—),男,江苏盐城人,硕士研究生,主要研究方向:深度学习、对抗攻击与防御;
    何仕远(1995—),男,青海西宁人,博士,主要研究方向:对抗攻击与防御、多媒体检索;
    宋井宽(1986—),男,江苏淮安人,教授,博士,CCF专业会员,主要研究方向:大规模多媒体检索、图像/视频分割、图像/视频理解;
    杨阳(1983—),男,辽宁大连人,教授,博士,CCF高级会员,主要研究方向:多媒体检索、社交媒体分析、机器学习。
    第一联系人:位纪伟(1991—),男,河南项城人,博士,CCF会员,主要研究方向:对抗攻击与防御、度量学习、跨模态检索;
  • 基金资助:
    国家自然科学基金资助项目(U20B2063);中国博士后科学基金资助项目(2022M720660)

Adversarial training method with adaptive attack strength

Tong CHEN, Jiwei WEI(), Shiyuan HE, Jingkuan SONG, Yang YANG   

  1. School of Computer Science and Engineering,University of Electronic Science and Technology of China,Chengdu Sichuan 611731,China
  • Received:2023-07-01 Revised:2023-08-24 Accepted:2023-08-28 Online:2023-09-14 Published:2024-01-10
  • Contact: Jiwei WEI
  • About author:CHEN Tong, born in 2000, M. S. candidate. His research interests include deep learning, adversarial attack and defense.
    HE Shiyuan, born in 1995, Ph. D. His research interests include adversarial attack and defense, multimedia retrieval.
    SONG Jingkuan, born in 1986, Ph. D., professor. His research interests include large-scale multimedia retrieval, image/video segmentation, image/video understanding.
    YANG Yang, born in 1983, Ph. D., professor. His research interests include multimedia retrieval, social media analysis, machine learning.
  • Supported by:
    National Natural Science Foundation of China(U20B2063);China Postdoctoral Science Foundation(2022M720660)

摘要:

深度神经网络(DNN)易受对抗样本攻击的特性引发了人们对人工智能系统安全性和可靠性的重大关切,其中对抗训练是增强对抗鲁棒性的一种有效方式。针对现有方法使用固定的对抗样本生成策略但存在忽视对抗样本生成阶段对对抗训练重要性的问题,提出一种基于自适应攻击强度的对抗训练方法。首先,将干净样本和对抗样本输入模型得到输出;然后,计算干净样本和对抗样本模型输出的差异;最后,衡量该差异与上一时刻差异的变化情况,并自动调整对抗样本强度。对三个基准数据集的全面实验结果表明,相较于基准方法投影梯度下降的对抗训练(PGD-AT),该方法在三个基准数据集的AA(AutoAttack)攻击下鲁棒精度分别提升1.92、1.50和3.35个百分点,且所提出方法在鲁棒性和自然准确率方面优于最先进的防御方法可学习攻击策略的对抗训练(LAS-AT)。此外,从数据增强角度看,该方法可以有效解决对抗训练这种特殊数据增强方式中增广效果随训练进展会不断下降的问题。

关键词: 对抗训练, 对抗样本, 对抗防御, 适应攻击强度, 深度学习, 图像分类, 人工智能安全

Abstract:

The vulnerability of deep neural networks to adversarial attacks has raised significant concerns about the security and reliability of artificial intelligence systems. Adversarial training is an effective approach to enhance adversarial robustness. To address the issue that existing methods adopt fixed adversarial sample generation strategies but neglect the importance of the adversarial sample generation phase for adversarial training, an adversarial training method was proposed based on adaptive attack strength. Firstly, the clean sample and the adversarial sample were input into the model to obtain the output. Then, the difference between the model outputs of the clean sample and the adversarial sample was calculated. Finally, the change of the difference compared with the previous moment was measured to automatically adjust the strength of the adversarial sample. Comprehensive experimental results on three benchmark datasets demonstrate that compared with the baseline method Adversarial Training with Projected Gradient Descent (PGD-AT), the proposed method improves the robust precision under AA (AutoAttack) attack by 1.92, 1.50 and 3.35 percentage points on three benchmark datasets, respectively, and the proposed method outperforms the state-of-the-art defense method Adversarial Training with Learnable Attack Strategy (LAS-AT) in terms of robustness and natural accuracy. Furthermore, from the perspective of data augmentation, the proposed method can effectively address the problem of diminishing augmentation effect during adversarial training.

Key words: adversarial training, adversarial example, adversarial defense, adaptive attack strength, deep learning, image classification, artificial intelligence security

中图分类号: