《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (2): 510-518.DOI: 10.11772/j.issn.1001-9081.2021020360

• 网络空间安全 • 上一篇    

面向深度学习可解释性的对抗攻击算法

陈权, 李莉, 陈永乐(), 段跃兴   

  1. 太原理工大学 信息与计算机学院,山西 晋中 030600
  • 收稿日期:2021-03-10 修回日期:2021-04-28 接受日期:2021-04-29 发布日期:2021-05-10 出版日期:2022-02-10
  • 通讯作者: 陈永乐
  • 作者简介:陈权(1996—),男,山西太原人,硕士研究生,主要研究方向:深度学习、对抗攻击;
    李莉(1997—),女,山西太原人,硕士研究生,主要研究方向:僵尸网络、物联网设备识别;
    陈永乐(1983—),男,山东潍坊人,教授,博士,主要研究方向:工业控制安全、物联网安全;
    段跃兴(1964—),男,山西晋中人,副教授,硕士,主要研究方向:语义网。
  • 基金资助:
    山西省重点研发计划项目(201903D121121)

Adversarial attack algorithm for deep learning interpretability

Quan CHEN, Li LI, Yongle CHEN(), Yuexing DUAN   

  1. College of Information and Computer,Taiyuan University of Technology,Jinzhong Shanxi 030600,China
  • Received:2021-03-10 Revised:2021-04-28 Accepted:2021-04-29 Online:2021-05-10 Published:2022-02-10
  • Contact: Yongle CHEN
  • About author:CHEN Quan, born in 1996, M. S. candidate. His research interests include deep learning, adversarial attack.
    LI Li, born in 1997, M. S. candidate. Her research interests include botnet, internet of things device identification.
    CHEN Yongle, born in 1983, Ph. D., professor. His research interests include industrial control safety, internet of things security.
    DUAN Yuexing, born in 1964, M. S., associate professor. His research interests include semantic Web.
  • Supported by:
    Key Research and Development Program of Shanxi Province(201903D121121)

摘要:

针对深度神经网络(DNN)中的可解释性导致模型信息泄露的问题,证明了在白盒环境下利用Grad-CAM解释方法产生对抗样本的可行性,并提出一种无目标的黑盒攻击算法——动态遗传算法。该算法首先根据解释区域与扰动像素位置的变化关系改进适应度函数,然后通过多轮的遗传算法在不断减少扰动值的同时递增扰动像素的数量,而且每一轮的结果坐标集会在下一轮的迭代中保留使用,直到在未超过扰动边界的情况下扰动像素集合使预测标签发生翻转。在实验部分,所提算法在AlexNet、VGG-19、ResNet-50和SqueezeNet模型下的攻击成功率平均为92.88%,与One pixel算法相比,虽然增加了8%的运行时间,但成功率提高了16.53个百分点。此外,该算法能够在更短的运行时间内,使成功率高于Ada-FGSM算法3.18个百分点,高于PPBA算法8.63个百分点,并且与Boundary-attack算法的成功率相差不大。结果表明基于解释方法的动态遗传算法能有效进行对抗攻击。

关键词: 深度神经网络, 解释方法, 显著图, 对抗攻击, 遗传算法

Abstract:

Aiming at the problem of model information leakage caused by interpretability in Deep Neural Network (DNN), the feasibility of using the Gradient-weighted Class Activation Mapping (Grad-CAM) interpretation method to generate adversarial samples in a white-box environment was proved, moreover, an untargeted black-box attack algorithm named dynamic genetic algorithm was proposed. In the algorithm, first, the fitness function was improved according to the changing relationship between the interpretation area and the positions of the disturbed pixels. Then, through multiple rounds of genetic algorithm, the disturbance value was continuously reduced while increasing the number of the disturbed pixels, and the set of result coordinates of each round would be maintained and used in the next round of iteration until the perturbed pixel set caused the predicted label to be flipped without exceeding the perturbation boundary. In the experiment part, the average attack success rate under the AlexNet, VGG-19, ResNet-50 and SqueezeNet models of the proposed algorithm was 92.88%, which was increased by 16.53 percentage points compared with that of One pixel algorithm, although with the running time increased by 8% compared with that of One pixel algorithm. In addition, in a shorter running time, the proposed algorithm had the success rate higher than the Adaptive Fast Gradient Sign Method (Ada-FGSM) algorithm by 3.18 percentage points, higher than the Projection & Probability-driven Black-box Attack (PPBA) algorithm by 8.63 percentage points, and not much different from Boundary-attack algorithm. The results show that the dynamic genetic algorithm based on the interpretation method can effectively execute the adversarial attack.

Key words: Deep Neural Network (DNN), interpretation method, saliency map, adversarial attack, genetic algorithm

中图分类号: