Text adversarial example generation method based on BERT model

doi:10.11772/j.issn.1001-9081.2022091468

Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (10): 3093-3098.DOI: 10.11772/j.issn.1001-9081.2022091468

• Artificial intelligence • Previous Articles

Text adversarial example generation method based on BERT model

Yuhang LI, Yuli YANG, Yao MA, Dan YU, Yongle CHEN()

College of Computer Science and Technology （College of Data Science），Taiyuan University of Technology，Taiyuan Shanxi 030600，China

Received:2022-10-08 Revised:2023-02-19 Accepted:2023-02-23 Online:2023-04-17 Published:2023-10-10
Contact: Yongle CHEN
About author:LI Yuhang， born in 1998， M. S. candidate. His research interests include artificial intelligence.
YANG Yuli， born in 1979， Ph. D.， lecturer. Her research interests include trusted cloud service computing， blockchain.
MA Yao， born in 1982， Ph. D.， lecturer. His research interests include Web security.
YU Dan， born in 1988， Ph. D. Her research interests include wireless sensor network， internet of things.
Supported by:
Basic Research Program of Shanxi Province(20210302123131)

基于BERT模型的文本对抗样本生成方法

李宇航, 杨玉丽, 马垚, 于丹, 陈永乐()

太原理工大学计算机科学与技术学院（大数据学院），太原 030600

通讯作者: 陈永乐
作者简介:李宇航（1998—），男，山西临汾人，硕士研究生，CCF会员，主要研究方向：人工智能
杨玉丽（1979—），女，山西临汾人，讲师，博士，CCF会员，主要研究方向：可信云服务计算、区块链
马垚（1982—），男，山西太原人，讲师，博士，CCF会员，主要研究方向：Web安全
于丹（1988—），女，北京人，博士，CCF会员，主要研究方向：无线传感网络、物联网；
基金资助:
山西省基础研究计划项目(20210302123131)

Abstract

Abstract:

Aiming at the problem that the existing adversarial example generation methods require a lot of queries to the target model， which leads to poor attack effects， a Text Adversarial Examples Generation Method based on BERT （Bidirectional Encoder Representations from Transformers） model （TAEGM） was proposed. Firstly， the attention mechanism was adopted to locate the keywords that significantly influence the classification results without query of the target model. Secondly， word-level perturbation of keywords was performed by BERT model to generate candidate adversarial examples. Finally， the candidate examples were clustered， and the adversarial examples were selected from the clusters that have more influence on the classification results. Experimental results on Yelp Reviews， AG News， and IMDB Review datasets show that compared to the suboptimal adversarial example generation method CLARE （ContextuaLized AdversaRial Example generation model） on Success Rate （SR）， TAEGM can reduce the Query Counts （QC） to the target model by 62.3% and time consumption by 68.6% averagely while ensuring the SR of adversarial attacks. Based on the above， further experimental results verify that the adversarial examples generated by TAEGM not only have good transferability， but also improve the robustness of the model through adversarial training.

Key words: adversarial example, attention mechanism, BERT (Bidirectional Encoder Representations from Transformers), adversarial attack, clustering algorithm

摘要：

针对现有对抗样本生成方法需要大量访问目标模型，导致攻击效果较差的问题，提出了基于BERT （Bidirectional Encoder Representations from Transformers）模型的文本对抗样本生成方法（TAEGM）。首先采用注意力机制，在不访问目标模型的情况下，定位显著影响分类结果的关键单词；其次通过BERT模型对关键单词进行单词级扰动，从而生成候选样本；最后对候选样本进行聚类，并从对分类结果影响更大的簇中选择对抗样本。在Yelp Reviews、AG News和IMDB Review数据集上的实验结果表明，相较于攻击成功率（SR）次优的对抗样本生成方法CLARE（ContextuaLized AdversaRial Example generation model），TAEGM在保证对抗攻击SR的前提下，对目标模型的访问次数（QC）平均减少了62.3%，时间平均减少了68.6%。在此基础之上，进一步的实验结果验证了TAEGM生成的对抗样本不仅具有很好的迁移性，还可以通过对抗训练提升模型的鲁棒性。

关键词: 对抗样本, 注意力机制, BERT, 对抗攻击, 聚类算法

CLC Number:

TP309

Yuhang LI, Yuli YANG, Yao MA, Dan YU, Yongle CHEN. Text adversarial example generation method based on BERT model[J]. Journal of Computer Applications, 2023, 43(10): 3093-3098.

李宇航, 杨玉丽, 马垚, 于丹, 陈永乐. 基于BERT模型的文本对抗样本生成方法[J]. 《计算机应用》唯一官方网站, 2023, 43(10): 3093-3098.

Figures/Tables 7

References 23

1	PAPERNOT N， McDANIEL P， SWAMI A， et al. Crafting adversarial input sequences for recurrent neural networks［C］// Proceedings of the 2016 IEEE Military Communications Conference. Piscataway： IEEE， 2016： 49-54. 10.1109/milcom.2016.7795300
2	SAMANGOUEI P， KABKAB M， CHELLAPPA R， et al. Defense-GAN： protecting classifiers against adversarial attacks using generative models［EB/OL］. （2018-05-18）［2022-07-13］..
3	潘文雯，王新宇，宋明黎，等. 对抗样本生成技术综述［J］. 软件学报， 2020， 31（1）：67-81.
	PAN W W， WANG X Y， SONG M L， et al. Survey on generating adversarial examples［J］. Journal of Software， 2020， 31（1）： 67-81.
4	王文琦，汪润，王丽娜，等. 面向中文文本倾向性分类的对抗样本生成方法［J］. 软件学报， 2019， 30（8）：2415-2427.
	WANG W Q， WANG R， WANG L N， et al. Adversarial examples generation approach for tendency classification on Chinese texts［J］. Journal of Software， 2019， 30（8）： 2415-2427.
5	LI J， JI S， DU T， et al. TextBugger： generating adversarial text against real-world applications［C］// Proceedings of the 26th Annual Network and Distributed System Security Symposium. Reston， VA： Internet Society， 2019： No.23138. 10.14722/ndss.2019.23138
6	SONG L， YU X， PENG H T， et al. Universal adversarial attacks with natural triggers for text classification［C］// Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies. Stroudsburg， PA： ACL， 2021： 3724-3733. 10.18653/v1/2021.naacl-main.291
7	MAHESHWARY R， MAHESHWARY S， PUDI V. A strong baseline for query efficient attacks in a black box setting［C］// Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Stroudsburg， PA： ACL， 2021： 8396-8409. 10.18653/v1/2021.emnlp-main.661
8	DEVLIN J， CHANG M W， LEE K， et al. BERT： pre-training of deep bidirectional Transformers for language understanding［C］// Proceedings of the2019 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies， Volume 1 （Long and Short Papers）. Stroudsburg， PA： ACL， 2019： 4171-4186. 10.18653/v1/n18-2
9	KULESHOV V， THAKOOR S， LAU T， et al. Adversarial examples for natural language classification problems［EB/OL］. ［2022-07-13］..
10	ALZANTOT M， SHARMA Y， ELGOHARY A， et al. Generating natural language adversarial examples［C］// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg， PA： ACL， 2018： 2890-2896. 10.18653/v1/d18-1316
11	REN S， DENG Y， HE K， et al. Generating natural language adversarial examples through probability weighted word saliency［C］// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg， PA： ACL， 2019： 1085-1097. 10.18653/v1/p19-1103
12	GARG S， RAMAKRISHNAN G. BAE： BERT-based adversarial examples for text classification［C］// Proceedings of the 2020 Conference on Empirical Methods in Natural. Stroudsburg， PA： ACL， 2020： 6174-6181. 10.18653/v1/2020.emnlp-main.498
13	仝鑫，王罗娜，王润正，等. 面向中文文本分类的词级对抗样本生成方法［J］. 信息网络安全， 2020， 20（9）：12-16. 10.3969/j.issn.1671-1122.2020.09.003
	TONG X， WANG L N， WANG R Z， et al. A generation method of word-level adversarial samples for Chinese text classification［J］. Netinfo Security， 2020， 20（9）：12-16. 10.3969/j.issn.1671-1122.2020.09.003
14	MAHESHWARY R， MAHESHWARY S， PUDI V. Generating natural language attacks in a hard label black box setting［C］// Proceedings of the 35th AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2021： 13525-13533. 10.1609/aaai.v35i15.17595
15	LI L， MA R， GUO Q， et al. BERT-ATTACK： adversarial attack against BERT using BERT［C］// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Stroudsburg， PA： ACL， 2020： 6193-6202. 10.18653/v1/2020.emnlp-main.500
16	MA X， ZHOU C， LI X， et al. FlowSeq： non-autoregressive conditional sequence generation with generative flow［C］// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Stroudsburg， PA： ACL， 2019： 4282-4292. 10.18653/v1/d19-1437
17	LIU Y， OTT M， GOYAL N， et al. RoBERTa： a robustly optimized BERT pretraining approach［EB/OL］. （2019-07-26）［2022-07-13］..
18	CER D， YANG Y， KONG S Y， et al. Universal sentence encoder for English［C］// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing： System Demonstrations. Stroudsburg， PA： ACL， 2018： 169-174. 10.18653/v1/d18-2029
19	ZHANG X， ZHAO J， LeCUN Y. Character-level convolutional networks for text classification［C］// Proceedings of the 28th International Conference on Neural Information Processing Systems — Volume 1. Cambridge： MIT Press， 2015：649-657.
20	MAAS A L， DALY R E， PHAM P T， et al. Learning word vectors for sentiment analysis［C］// Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics： Human Language Technologies. Stroudsburg， PA： ACL， 2011：142-150.
21	JIN D， JIN Z， ZHOU J T， et al. Is BERT really robust？ natural language attack on text classification and entailment［C］// Proceedings of the 34th AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2020： 8018-8025. 10.1609/aaai.v34i05.6311
22	YE M， MIAO C， WANG T， et al. TextHoaxer： budgeted hard-label adversarial attacks on text［C］// Proceedings of the 36th AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2022： 3877-3884. 10.1609/aaai.v36i4.20303
23	LI D， ZHANG Y， PENG H， et al. Contextualized perturbation for textual adversarial attack［C］// Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies. Stroudsburg， PA： ACL， 2020： 5053-5069. 10.18653/v1/2021.naacl-main.400

数据集	方法	ACC/%	SR/%	QC	Sim	SCR/%	时间/ms
Yelp Reviews	Textfooler	99.2	77.8	581.0	0.68	18.1	954.4
	TextHoaxer	99.2	78.0	800.3	0.73	9.3	1 364.4
	CLARE	99.2	78.7	1 391.6	0.78	10.6	3 268.6
	TAEGM	99.2	89.9	675.3	0.80	8.9	1 032.2
AG News	Textfooler	96.6	63.6	535.1	0.64	26.4	992.3
	TextHoaxer	96.6	77.4	1 100.2	0.77	7.9	1 342.5
	CLARE	96.6	82.1	2 834.7	0.71	8.5	4 031.5
	TAEGM	96.6	82.3	822.9	0.78	7.5	1 105.3
IMDB Review	Textfooler	96.1	77.6	584.3	0.74	15.4	833.5
	TextHoaxer	96.1	80.2	943.4	0.82	7.9	1 443.2
	CLARE	96.1	82.6	1 406.6	0.82	7.6	2 677.6
	TAEGM	96.1	81.8	624.4	0.88	7.6	994.7

数据集	方法	ACC/%	SR/%	QC	Sim	SCR/%	时间/ms
Yelp Reviews	Textfooler	99.2	77.8	581.0	0.68	18.1	954.4
	TextHoaxer	99.2	78.0	800.3	0.73	9.3	1 364.4
	CLARE	99.2	78.7	1 391.6	0.78	10.6	3 268.6
	TAEGM	99.2	89.9	675.3	0.80	8.9	1 032.2
AG News	Textfooler	96.6	63.6	535.1	0.64	26.4	992.3
	TextHoaxer	96.6	77.4	1 100.2	0.77	7.9	1 342.5
	CLARE	96.6	82.1	2 834.7	0.71	8.5	4 031.5
	TAEGM	96.6	82.3	822.9	0.78	7.5	1 105.3
IMDB Review	Textfooler	96.1	77.6	584.3	0.74	15.4	833.5
	TextHoaxer	96.1	80.2	943.4	0.82	7.9	1 443.2
	CLARE	96.1	82.6	1 406.6	0.82	7.6	2 677.6
	TAEGM	96.1	81.8	624.4	0.88	7.6	994.7

生成对抗样本的模型	受攻击模型
生成对抗样本的模型	TEXTCNN1	TEXTCNN2	BERT
TEXTCNN1	98.0	68.7	65.3
TEXTCNN2	71.0	92.9	67.7
BERT	74.6	72.9	89.9

生成对抗样本的模型	受攻击模型
生成对抗样本的模型	TEXTCNN1	TEXTCNN2	BERT
TEXTCNN1	98.0	68.7	65.3
TEXTCNN2	71.0	92.9	67.7
BERT	74.6	72.9	89.9

数据集	训练集样本数	对抗样本数	ACC/%	SR/%
Yelp Reviews	124 000	2 500	98.0	53.7
AG News	124 000	2 500	94.7	51.0
IMDB Review	25 000	2 500	93.3	52.5

Text adversarial example generation method based on BERT model

基于BERT模型的文本对抗样本生成方法

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 7

References 23

Related Articles 15

Recommended Articles

Metrics

[1]	Hao YANG, Yi ZHANG. Feature pyramid network algorithm based on context information and multi-scale fusion importance awareness [J]. Journal of Computer Applications, 2023, 43(9): 2727-2734.
[2]	Guolong YUAN, Yujin ZHANG, Yang LIU. Image tampering forensics network based on residual feedback and self-attention [J]. Journal of Computer Applications, 2023, 43(9): 2925-2931.
[3]	Hong WANG, Qing QIAN, Huan WANG, Yong LONG. Lightweight image tamper localization algorithm based on large kernel attention convolution [J]. Journal of Computer Applications, 2023, 43(9): 2692-2699.
[4]	Zhong LI, Yajing WANG, Qiaomei MA. Super-resolution reconstruction algorithm of medical images based on dilated convolution [J]. Journal of Computer Applications, 2023, 43(9): 2940-2947.
[5]	Yuan LIU, Yongquan DONG, Rui JIA, Haolin YANG. Hierarchical and phased attention network model for personalized course recommendation [J]. Journal of Computer Applications, 2023, 43(8): 2358-2363.
[6]	Jinghong WANG, Zhixia ZHOU, Hui WANG, Haokang LI. Attribute network representation learning with dual auto-encoder [J]. Journal of Computer Applications, 2023, 43(8): 2338-2344.
[7]	Yumeng CUI, Jingya WANG, Xiaowen LIU, Shangyi YAN, Zhizhong TAO. General text classification model combining attention and cropping mechanism [J]. Journal of Computer Applications, 2023, 43(8): 2396-2405.
[8]	Ailing QI, Xuanlin WANG. Fine-grained image recognition based on mid-level subtle feature extraction and multi-scale feature fusion [J]. Journal of Computer Applications, 2023, 43(8): 2556-2563.
[9]	Zexi JIN, Lei LI, Ji LIU. Transfer learning model based on improved domain separation network [J]. Journal of Computer Applications, 2023, 43(8): 2382-2389.
[10]	Shengwei DUAN, Xinyu CHENG, Haozhou WANG, Fei WANG. Dam surface disease detection algorithm based on improved YOLOv5 [J]. Journal of Computer Applications, 2023, 43(8): 2619-2629.
[11]	Meijia LIANG, Xinwu LIU, Xiaopeng HU. Small target detection algorithm for train operating environment image based on improved YOLOv3 [J]. Journal of Computer Applications, 2023, 43(8): 2611-2618.
[12]	Yuan WEI, Yan LIN, Shengnan GUO, Youfang LIN, Huaiyu WAN. Prediction of taxi demands between urban regions by fusing origin-destination spatial-temporal correlation [J]. Journal of Computer Applications, 2023, 43(7): 2100-2106.
[13]	Zhongyu LI, Haodong SUN, Jiao LI. Lightweight gesture recognition algorithm for basketball referee [J]. Journal of Computer Applications, 2023, 43(7): 2173-2181.
[14]	Yuxin TUO, Tao XUE. Joint triple extraction model combining pointer network and relational embedding [J]. Journal of Computer Applications, 2023, 43(7): 2116-2124.
[15]	Yuanyuan QIN, Hong ZHANG. Pulmonary nodule detection algorithm based on attention feature pyramid networks [J]. Journal of Computer Applications, 2023, 43(7): 2311-2318.