面向深度学习可解释性的对抗攻击算法

doi:10.11772/j.issn.1001-9081.2021020360

《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (2): 510-518.DOI: 10.11772/j.issn.1001-9081.2021020360

所属专题：网络空间安全

面向深度学习可解释性的对抗攻击算法

陈权, 李莉, 陈永乐(), 段跃兴

太原理工大学信息与计算机学院，山西晋中 030600

收稿日期:2021-03-10 修回日期:2021-04-28 接受日期:2021-04-29 发布日期:2022-02-11 出版日期:2022-02-10
通讯作者: 陈永乐
作者简介:陈权（1996—），男，山西太原人，硕士研究生，主要研究方向：深度学习、对抗攻击；
李莉（1997—），女，山西太原人，硕士研究生，主要研究方向：僵尸网络、物联网设备识别；
陈永乐（1983—），男，山东潍坊人，教授，博士，主要研究方向：工业控制安全、物联网安全；
段跃兴（1964—），男，山西晋中人，副教授，硕士，主要研究方向：语义网。
基金资助:
山西省重点研发计划项目(201903D121121)

Adversarial attack algorithm for deep learning interpretability

Quan CHEN, Li LI, Yongle CHEN(), Yuexing DUAN

College of Information and Computer，Taiyuan University of Technology，Jinzhong Shanxi 030600，China

Received:2021-03-10 Revised:2021-04-28 Accepted:2021-04-29 Online:2022-02-11 Published:2022-02-10
Contact: Yongle CHEN
About author:CHEN Quan， born in 1996， M. S. candidate. His research interests include deep learning， adversarial attack.
LI Li， born in 1997， M. S. candidate. Her research interests include botnet， internet of things device identification.
CHEN Yongle， born in 1983， Ph. D.， professor. His research interests include industrial control safety， internet of things security.
DUAN Yuexing， born in 1964， M. S.， associate professor. His research interests include semantic Web.
Supported by:
Key Research and Development Program of Shanxi Province(201903D121121)

摘要/Abstract

摘要：

针对深度神经网络（DNN）中的可解释性导致模型信息泄露的问题，证明了在白盒环境下利用Grad-CAM解释方法产生对抗样本的可行性，并提出一种无目标的黑盒攻击算法——动态遗传算法。该算法首先根据解释区域与扰动像素位置的变化关系改进适应度函数，然后通过多轮的遗传算法在不断减少扰动值的同时递增扰动像素的数量，而且每一轮的结果坐标集会在下一轮的迭代中保留使用，直到在未超过扰动边界的情况下扰动像素集合使预测标签发生翻转。在实验部分，所提算法在AlexNet、VGG-19、ResNet-50和SqueezeNet模型下的攻击成功率平均为92.88%，与One pixel算法相比，虽然增加了8%的运行时间，但成功率提高了16.53个百分点。此外，该算法能够在更短的运行时间内，使成功率高于Ada-FGSM算法3.18个百分点，高于PPBA算法8.63个百分点，并且与Boundary-attack算法的成功率相差不大。结果表明基于解释方法的动态遗传算法能有效进行对抗攻击。

关键词: 深度神经网络, 解释方法, 显著图, 对抗攻击, 遗传算法

Abstract:

Aiming at the problem of model information leakage caused by interpretability in Deep Neural Network （DNN）， the feasibility of using the Gradient-weighted Class Activation Mapping （Grad-CAM） interpretation method to generate adversarial samples in a white-box environment was proved， moreover， an untargeted black-box attack algorithm named dynamic genetic algorithm was proposed. In the algorithm， first， the fitness function was improved according to the changing relationship between the interpretation area and the positions of the disturbed pixels. Then， through multiple rounds of genetic algorithm， the disturbance value was continuously reduced while increasing the number of the disturbed pixels， and the set of result coordinates of each round would be maintained and used in the next round of iteration until the perturbed pixel set caused the predicted label to be flipped without exceeding the perturbation boundary. In the experiment part， the average attack success rate under the AlexNet， VGG-19， ResNet-50 and SqueezeNet models of the proposed algorithm was 92.88%， which was increased by 16.53 percentage points compared with that of One pixel algorithm， although with the running time increased by 8% compared with that of One pixel algorithm. In addition， in a shorter running time， the proposed algorithm had the success rate higher than the Adaptive Fast Gradient Sign Method （Ada-FGSM） algorithm by 3.18 percentage points， higher than the Projection & Probability-driven Black-box Attack （PPBA） algorithm by 8.63 percentage points， and not much different from Boundary-attack algorithm. The results show that the dynamic genetic algorithm based on the interpretation method can effectively execute the adversarial attack.

Key words: Deep Neural Network (DNN), interpretation method, saliency map, adversarial attack, genetic algorithm

中图分类号:

TP393.08

陈权, 李莉, 陈永乐, 段跃兴. 面向深度学习可解释性的对抗攻击算法[J]. 计算机应用, 2022, 42(2): 510-518.

Quan CHEN, Li LI, Yongle CHEN, Yuexing DUAN. Adversarial attack algorithm for deep learning interpretability[J]. Journal of Computer Applications, 2022, 42(2): 510-518.

图/表 17

图1 不同解释方法的解释区域

Fig. 1 Interpretation areas of different interpretation methods

图2 迭代收敛图

Fig. 2 Iterative convergence graph

表1 不同优化算法的运行时间

Tab. 1 Running times of different optimization algorithms

算法	运行时间/s
动态遗传算法	13.0
灰狼算法	14.6
布谷鸟算法	20.1
樽海鞘群算法	11.2

表2 不同模型下的无目标攻击成功率 (%)

Tab. 2 Untargeted attack successrate under different models

算法	AlexNet	VGG-19	ResNet-50	SqueezeNet
BIM	99.50	99.60	100.00	99.60
DeepFool	99.40	99.80	99.90	99.80
Grad-CAM Attack（S=1）	72.50	72.30	91.10	86.50
Grad-CAM Attack （S=-1）	48.50	16.50	9.90	39.00

图3 算法1下的攻击结果

Fig. 3 Attack result graphs under algorithm 1

图5 攻击失败结果

Fig. 5 Attack failure results

图4 样本成功攻击迭代次数分布

Fig. 4 Distribution of iteration numbers of sample successful attack

图6 不同修改像素数量下的成功率和迭代次数的关系

Fig. 6 Relationship between success rate and iteration number under different numbers of modifiedpixels

图7 迭代攻击过程

Fig. 7 Iterative attack process

表3 改进后不同模型下的无目标攻击成功率 (%)

Tab. 3 Untargeted attack success rate under different models after improvement

算法	AlexNet	VGG-19	ResNet-50	SqueezeNet
BIM	99.50	99.60	100.00	99.60
DeepFool	99.40	99.80	99.90	99.80
Grad-CAM Attack（S=1）	92.30	93.30	95.90	94.70

图8 解释空间和决策空间示意图

Fig. 8 Schematic diagram of interpretation space and decision space

图9 二维平面上点的移动示意图

Fig. 9 Schematic diagram of movement of a point on two-dimensional plane

表4 参数设置

Tab. 4 Parameter settings

参数	值	参数	值
最大失真	0.09	扰动像素递增值	18
初始扰动系数	0.21	变异率	0.05
扰动递减值	0.02	交叉率	0.96
迭代次数	150

图10 不同解释方法下的扰动结果

Fig. 10 Disturbance results under different interpretation methods

表5 不同模型下的黑盒攻击成功率 ( %)

Tab. 5 Black-box attack success rate under different models

算法	AlexNet	VGG-19	ResNet-50	SqueezeNet	平均成功率
One pixel	83.00	73.80	79.00	69.60	76.35
Boundary-attack	97.40	98.80	97.90	96.80	97.73
Ada-FGSM	90.61	91.74	89.57	86.88	89.70
TREMBA	93.90	95.80	92.20	93.72	93.91
PPBA	89.60	90.30	84.80	72.30	84.25
动态遗传算法（Grad-CAM）	93.10	94.30	91.40	92.70	92.88

表6 平均50张图片的处理时间 ( s)

Tab. 6 Average processing time of 50 images

算法	AlexNet	VGG-19	ResNet-50	SqueezeNet
One pixel	16.2	18.6	18.7	17.4
Boundary-attack	18.3	19.2	19.4	19.9
Ada-FGSM	19.2	19.9	20.5	19.8
TREMBA	29.6	31.3	32.2	30.4
PPBA	19.7	21.4	22.6	22.1
动态遗传算法（Grad-CAM）	18.9	19.2	19.3	19.0

表7 参数性能对比（SqueezeNet）

Tab. 7 Performance comparison of different parameters （SqueezeNet）

组号	扰动递减值	扰动像素递增值	运行时间/s	成功率/%
1^*	0.02	18	19.0	92.88
2	0.02	15	55.2	94.20
3	0.04	18	16.8	89.30
4	0.03	17	29.8	91.60
5	0.01	19	24.4	90.80

参考文献 29

1	吴飞，廖彬兵，韩亚洪.深度学习的可解释性［J］.航空兵器， 2019， 26（1）： 39-46. 10.12132/ISSN.1673-5048.2018.0065
	WU F， LIAO B B， HAN Y H. Interpretability for deep learning［J］. Aero Weapon， 2019， 26（1）： 39-46. 10.12132/ISSN.1673-5048.2018.0065
2	GOODFELLOW I J， SHLENS J， SZEGEDY C. Explaining and harnessing adversarial examples［EB/OL］. （2015-03-20）［2020-10-29］. .
3	KURAKIN A， GOODFELLOW I， BENGIO S. Adversarial examples in the physical world［EB/OL］. （2017-02-11）［2020-10-29］. . 10.1201/9781351251389-8
4	CARLINI N， WAGNER D. Towards evaluating the robustness of neural networks ［C］// Proceedings of the 2017 IEEE Symposium on Security and Privacy. Piscataway： IEEE， 2017： 39-57. 10.1109/sp.2017.49
5	PAPERNOT N， McDANIEL P， JHA S， et al. The limitations of deep learning in adversarial settings ［C］// Proceedings of the 2016 IEEE European Symposium on Security and Privacy. Piscataway： IEEE， 2016： 372-387. 10.1109/eurosp.2016.36
6	SHI Y C， HAN Y H， ZHANG Q X， et al. Adaptive iterative attack towards explainable adversarial robustness［J］. Pattern Recognition， 2020， 105： No.107309. 10.1016/j.patcog.2020.107309
7	DONG X Y， HAN J F， CHEN D D， et al. Robust superpixel-guided attentional adversarial attack ［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 12892-12901. 10.1109/cvpr42600.2020.01291
8	LI J， JI R R， LIU H， et al. Projection & probability-driven black-box attack ［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 359-368. 10.1109/cvpr42600.2020.00044
9	HUANG Z C， ZHANG T. Black-box adversarial attack with transferable model-based embedding［EB/OL］. （2020-01-05）［2020-10-29］. .
10	SIMONYAN K， VEDALDI A， ZISSERMAN A. Deep inside convolutional networks： visualising image classification models and saliency maps［EB/OL］. （2014-04-19）［2020-10-29］. . 10.5244/c.28.6
11	SELVARAJU R R， COGSWELL M， DAS A， et al. Grad-CAM： visual explanations from deep networks via gradient-based localization ［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 618-626. 10.1109/iccv.2017.74
12	SZEGEDY C， ZAREMBA W， SUTSKEVER I， et al. Intriguing properties of neural networks［EB/OL］. （2014-02-19）［2020-10-29］. .
13	MOOSAVI-DEZFOOLI S M， FAWZI A， FROSSARD P. DeepFool： a simple and accurate method to fool deep neural networks ［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 2574-2582. 10.1109/cvpr.2016.282
14	SU J W， VARGAS D V， SAKURAI K. One pixel attack for fooling deep neural networks［J］. IEEE Transactions on Evolutionary Computation， 2019， 23（5）： 828-841. 10.1109/tevc.2019.2890858
15	BRENDEL W， RAUBER J， BETHGE M. Decision-based adversarial attacks： reliable attacks against black-box machine learning models［EB/OL］. （2018-02-16）［2020-10-29］. . 10.21105/joss.02607
16	GHORBANI A， ABID A， ZOU J. Interpretation of neural networks is fragile ［C］// Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2019： 3681-3688. 10.1609/aaai.v33i01.33013681
17	ZHANG X Y， WANG N F， SHEN H， et al. Interpretable deep learning under fire ［C］// Proceedings of the 29th USENIX Security Symposium. Berkeley： USENIX Association， 2020： 1659-1676.
18	YE D P， CHEN C X， LIU C R， et al. Detection defense against adversarial attacks with saliency map［EB/OL］. （2020-09-06）［2020-10-29］. . 10.1002/int.22458
19	DABKOWSKI P， GAL Y. Real time image saliency for black box classifiers［EB/OL］. （2017-05-22）［2020-10-29］. .
20	FONG R C， VEDALDI A. Interpretable explanations of black boxes by meaningful perturbation ［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 3449-3457. 10.1109/iccv.2017.371
21	SPRINGENBERG J T， DOSOVITSKIY A， BROX T， et al. Striving for simplicity： the all convolutional net［EB/OL］. （2015-04-13）［2020-10-29］. . 10.1109/cvpr.2015.7298761
22	KINDERMANS P J， SCHÜTT K T， ALBER M， et al. Learning how to explain neural networks： patternNet and patternAttribution［EB/OL］. （2017-10-24）［2020-10-29］. .
23	RUDOLPH G. Convergence analysis of canonical genetic algorithms［J］. IEEE Transactions on Neural Networks， 1994， 5（1）： 96-101. 10.1109/72.265964
24	KRIZHEVSKY A， HINTON G. Learning multiple layers of features from tiny images［EB/OL］. （2009-04-08）. ［2020-10-29］. .
25	DENG J， DONG W， SOCHER R， et al. ImageNet： a large-scale hierarchical image database ［C］// Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2009： 248-255. 10.1109/cvpr.2009.5206848
26	SIMONYAN K， ZISSERMAN A. Very deep convolutional networks for large-scale image recognition［EB/OL］. （2015-04-10）［2020-10-29］. . 10.5244/c.28.6
27	HE K M， ZHANG X Y， REN S Q， et al. Deep residual learning for image recognition ［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 770-778. 10.1109/cvpr.2016.90
28	IANDOLA F N， HAN S， MOSKEWICZ M W， et al. SqueezeNet： AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size［EB/OL］. （2016-11-04）［2020-10-29］. .
29	KRIZHEVSKY A， SUTSKEVER I， HINTON G E. ImageNet classification with deep convolutional neural networks ［C］// Proceedings of the 25th International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2012： 1097-1105.

[1]	刘栋, 李晨航, 吴长茂, 茹法鑫, 夏媛媛. 基于可校正强化搜索遗传算法的光学系统自动设计[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2838-2847.
[2]	石锐, 李勇, 朱延晗. 基于特征梯度均值化的调制信号对抗样本攻击算法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2521-2527.
[3]	王美, 苏雪松, 刘佳, 殷若南, 黄珊. 时频域多尺度交叉注意力融合的时间序列分类方法[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1842-1847.
[4]	李炫锋, 刘晟材, 唐珂. 机会约束的多选择背包问题的遗传算法求解[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1378-1385.
[5]	高麟, 周宇, 邝得互. 进化双层自适应局部特征选择[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1408-1414.
[6]	肖斌, 杨模, 汪敏, 秦光源, 李欢. 独立性视角下的相频融合领域泛化方法[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1002-1009.
[7]	王杰, 孟华. 基于点云整体拓扑结构的图像分类算法[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1107-1113.
[8]	佘维, 李阳, 钟李红, 孔德锋, 田钊. 基于改进实数编码遗传算法的神经网络超参数优化[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 671-676.
[9]	颜梦玫, 杨冬平. 深度神经网络平均场理论综述[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 331-343.
[10]	柴汶泽, 范菁, 孙书魁, 梁一鸣, 刘竟锋. 深度度量学习综述[J]. 《计算机应用》唯一官方网站, 2024, 44(10): 2995-3010.
[11]	申云飞, 申飞, 李芳, 张俊. 基于张量虚拟机的深度神经网络模型加速方法[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2836-2844.
[12]	赵旭剑, 李杭霖. 基于混合机制的深度神经网络压缩算法[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2686-2691.
[13]	李校林, 杨松佳. 基于深度学习的多用户毫米波中继网络混合波束赋形[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2511-2516.
[14]	李淦, 牛洺第, 陈路, 杨静, 闫涛, 陈斌. 融合视觉特征增强机制的机器人弱光环境抓取检测[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2564-2571.
[15]	王强, 黄小明, 佟强, 刘秀磊. 基于边界框标注的弱监督显著性目标检测算法[J]. 《计算机应用》唯一官方网站, 2023, 43(6): 1910-1918.

面向深度学习可解释性的对抗攻击算法

Adversarial attack algorithm for deep learning interpretability

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 17

参考文献 29

相关文章 15

编辑推荐

Metrics