Text adversarial example generation method based on BERT model

doi:10.11772/j.issn.1001-9081.2022091468

Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (10): 3093-3098.DOI: 10.11772/j.issn.1001-9081.2022091468

Special Issue: 人工智能

• Artificial intelligence • Previous Articles Next Articles

Text adversarial example generation method based on BERT model

Yuhang LI, Yuli YANG, Yao MA, Dan YU, Yongle CHEN()

College of Computer Science and Technology （College of Data Science），Taiyuan University of Technology，Taiyuan Shanxi 030600，China

Received:2022-10-08 Revised:2023-02-19 Accepted:2023-02-23 Online:2023-04-17 Published:2023-10-10
Contact: Yongle CHEN
About author:LI Yuhang， born in 1998， M. S. candidate. His research interests include artificial intelligence.
YANG Yuli， born in 1979， Ph. D.， lecturer. Her research interests include trusted cloud service computing， blockchain.
MA Yao， born in 1982， Ph. D.， lecturer. His research interests include Web security.
YU Dan， born in 1988， Ph. D. Her research interests include wireless sensor network， internet of things.
Supported by:
Basic Research Program of Shanxi Province(20210302123131)

基于BERT模型的文本对抗样本生成方法

李宇航, 杨玉丽, 马垚, 于丹, 陈永乐()

太原理工大学计算机科学与技术学院（大数据学院），太原 030600

通讯作者: 陈永乐
作者简介:李宇航（1998—），男，山西临汾人，硕士研究生，CCF会员，主要研究方向：人工智能
杨玉丽（1979—），女，山西临汾人，讲师，博士，CCF会员，主要研究方向：可信云服务计算、区块链
马垚（1982—），男，山西太原人，讲师，博士，CCF会员，主要研究方向：Web安全
于丹（1988—），女，北京人，博士，CCF会员，主要研究方向：无线传感网络、物联网；
基金资助:
山西省基础研究计划项目(20210302123131)

Abstract

Abstract:

Aiming at the problem that the existing adversarial example generation methods require a lot of queries to the target model， which leads to poor attack effects， a Text Adversarial Examples Generation Method based on BERT （Bidirectional Encoder Representations from Transformers） model （TAEGM） was proposed. Firstly， the attention mechanism was adopted to locate the keywords that significantly influence the classification results without query of the target model. Secondly， word-level perturbation of keywords was performed by BERT model to generate candidate adversarial examples. Finally， the candidate examples were clustered， and the adversarial examples were selected from the clusters that have more influence on the classification results. Experimental results on Yelp Reviews， AG News， and IMDB Review datasets show that compared to the suboptimal adversarial example generation method CLARE （ContextuaLized AdversaRial Example generation model） on Success Rate （SR）， TAEGM can reduce the Query Counts （QC） to the target model by 62.3% and time consumption by 68.6% averagely while ensuring the SR of adversarial attacks. Based on the above， further experimental results verify that the adversarial examples generated by TAEGM not only have good transferability， but also improve the robustness of the model through adversarial training.

Key words: adversarial example, attention mechanism, BERT (Bidirectional Encoder Representations from Transformers), adversarial attack, clustering algorithm

摘要：

针对现有对抗样本生成方法需要大量访问目标模型，导致攻击效果较差的问题，提出了基于BERT （Bidirectional Encoder Representations from Transformers）模型的文本对抗样本生成方法（TAEGM）。首先采用注意力机制，在不访问目标模型的情况下，定位显著影响分类结果的关键单词；其次通过BERT模型对关键单词进行单词级扰动，从而生成候选样本；最后对候选样本进行聚类，并从对分类结果影响更大的簇中选择对抗样本。在Yelp Reviews、AG News和IMDB Review数据集上的实验结果表明，相较于攻击成功率（SR）次优的对抗样本生成方法CLARE（ContextuaLized AdversaRial Example generation model），TAEGM在保证对抗攻击SR的前提下，对目标模型的访问次数（QC）平均减少了62.3%，时间平均减少了68.6%。在此基础之上，进一步的实验结果验证了TAEGM生成的对抗样本不仅具有很好的迁移性，还可以通过对抗训练提升模型的鲁棒性。

关键词: 对抗样本, 注意力机制, BERT, 对抗攻击, 聚类算法

CLC Number:

TP309

Yuhang LI, Yuli YANG, Yao MA, Dan YU, Yongle CHEN. Text adversarial example generation method based on BERT model[J]. Journal of Computer Applications, 2023, 43(10): 3093-3098.

李宇航, 杨玉丽, 马垚, 于丹, 陈永乐. 基于BERT模型的文本对抗样本生成方法[J]. 《计算机应用》唯一官方网站, 2023, 43(10): 3093-3098.

Figures/Tables 7

References 23

1	PAPERNOT N， McDANIEL P， SWAMI A， et al. Crafting adversarial input sequences for recurrent neural networks［C］// Proceedings of the 2016 IEEE Military Communications Conference. Piscataway： IEEE， 2016： 49-54. 10.1109/milcom.2016.7795300
2	SAMANGOUEI P， KABKAB M， CHELLAPPA R， et al. Defense-GAN： protecting classifiers against adversarial attacks using generative models［EB/OL］. （2018-05-18）［2022-07-13］..
3	潘文雯，王新宇，宋明黎，等. 对抗样本生成技术综述［J］. 软件学报， 2020， 31（1）：67-81.
	PAN W W， WANG X Y， SONG M L， et al. Survey on generating adversarial examples［J］. Journal of Software， 2020， 31（1）： 67-81.
4	王文琦，汪润，王丽娜，等. 面向中文文本倾向性分类的对抗样本生成方法［J］. 软件学报， 2019， 30（8）：2415-2427.
	WANG W Q， WANG R， WANG L N， et al. Adversarial examples generation approach for tendency classification on Chinese texts［J］. Journal of Software， 2019， 30（8）： 2415-2427.
5	LI J， JI S， DU T， et al. TextBugger： generating adversarial text against real-world applications［C］// Proceedings of the 26th Annual Network and Distributed System Security Symposium. Reston， VA： Internet Society， 2019： No.23138. 10.14722/ndss.2019.23138
6	SONG L， YU X， PENG H T， et al. Universal adversarial attacks with natural triggers for text classification［C］// Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies. Stroudsburg， PA： ACL， 2021： 3724-3733. 10.18653/v1/2021.naacl-main.291
7	MAHESHWARY R， MAHESHWARY S， PUDI V. A strong baseline for query efficient attacks in a black box setting［C］// Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Stroudsburg， PA： ACL， 2021： 8396-8409. 10.18653/v1/2021.emnlp-main.661
8	DEVLIN J， CHANG M W， LEE K， et al. BERT： pre-training of deep bidirectional Transformers for language understanding［C］// Proceedings of the2019 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies， Volume 1 （Long and Short Papers）. Stroudsburg， PA： ACL， 2019： 4171-4186. 10.18653/v1/n18-2
9	KULESHOV V， THAKOOR S， LAU T， et al. Adversarial examples for natural language classification problems［EB/OL］. ［2022-07-13］..
10	ALZANTOT M， SHARMA Y， ELGOHARY A， et al. Generating natural language adversarial examples［C］// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg， PA： ACL， 2018： 2890-2896. 10.18653/v1/d18-1316
11	REN S， DENG Y， HE K， et al. Generating natural language adversarial examples through probability weighted word saliency［C］// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg， PA： ACL， 2019： 1085-1097. 10.18653/v1/p19-1103
12	GARG S， RAMAKRISHNAN G. BAE： BERT-based adversarial examples for text classification［C］// Proceedings of the 2020 Conference on Empirical Methods in Natural. Stroudsburg， PA： ACL， 2020： 6174-6181. 10.18653/v1/2020.emnlp-main.498
13	仝鑫，王罗娜，王润正，等. 面向中文文本分类的词级对抗样本生成方法［J］. 信息网络安全， 2020， 20（9）：12-16. 10.3969/j.issn.1671-1122.2020.09.003
	TONG X， WANG L N， WANG R Z， et al. A generation method of word-level adversarial samples for Chinese text classification［J］. Netinfo Security， 2020， 20（9）：12-16. 10.3969/j.issn.1671-1122.2020.09.003
14	MAHESHWARY R， MAHESHWARY S， PUDI V. Generating natural language attacks in a hard label black box setting［C］// Proceedings of the 35th AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2021： 13525-13533. 10.1609/aaai.v35i15.17595
15	LI L， MA R， GUO Q， et al. BERT-ATTACK： adversarial attack against BERT using BERT［C］// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Stroudsburg， PA： ACL， 2020： 6193-6202. 10.18653/v1/2020.emnlp-main.500
16	MA X， ZHOU C， LI X， et al. FlowSeq： non-autoregressive conditional sequence generation with generative flow［C］// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Stroudsburg， PA： ACL， 2019： 4282-4292. 10.18653/v1/d19-1437
17	LIU Y， OTT M， GOYAL N， et al. RoBERTa： a robustly optimized BERT pretraining approach［EB/OL］. （2019-07-26）［2022-07-13］..
18	CER D， YANG Y， KONG S Y， et al. Universal sentence encoder for English［C］// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing： System Demonstrations. Stroudsburg， PA： ACL， 2018： 169-174. 10.18653/v1/d18-2029
19	ZHANG X， ZHAO J， LeCUN Y. Character-level convolutional networks for text classification［C］// Proceedings of the 28th International Conference on Neural Information Processing Systems — Volume 1. Cambridge： MIT Press， 2015：649-657.
20	MAAS A L， DALY R E， PHAM P T， et al. Learning word vectors for sentiment analysis［C］// Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics： Human Language Technologies. Stroudsburg， PA： ACL， 2011：142-150.
21	JIN D， JIN Z， ZHOU J T， et al. Is BERT really robust？ natural language attack on text classification and entailment［C］// Proceedings of the 34th AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2020： 8018-8025. 10.1609/aaai.v34i05.6311
22	YE M， MIAO C， WANG T， et al. TextHoaxer： budgeted hard-label adversarial attacks on text［C］// Proceedings of the 36th AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2022： 3877-3884. 10.1609/aaai.v36i4.20303
23	LI D， ZHANG Y， PENG H， et al. Contextualized perturbation for textual adversarial attack［C］// Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies. Stroudsburg， PA： ACL， 2020： 5053-5069. 10.18653/v1/2021.naacl-main.400

数据集	方法	ACC/%	SR/%	QC	Sim	SCR/%	时间/ms
Yelp Reviews	Textfooler	99.2	77.8	581.0	0.68	18.1	954.4
	TextHoaxer	99.2	78.0	800.3	0.73	9.3	1 364.4
	CLARE	99.2	78.7	1 391.6	0.78	10.6	3 268.6
	TAEGM	99.2	89.9	675.3	0.80	8.9	1 032.2
AG News	Textfooler	96.6	63.6	535.1	0.64	26.4	992.3
	TextHoaxer	96.6	77.4	1 100.2	0.77	7.9	1 342.5
	CLARE	96.6	82.1	2 834.7	0.71	8.5	4 031.5
	TAEGM	96.6	82.3	822.9	0.78	7.5	1 105.3
IMDB Review	Textfooler	96.1	77.6	584.3	0.74	15.4	833.5
	TextHoaxer	96.1	80.2	943.4	0.82	7.9	1 443.2
	CLARE	96.1	82.6	1 406.6	0.82	7.6	2 677.6
	TAEGM	96.1	81.8	624.4	0.88	7.6	994.7

数据集	方法	ACC/%	SR/%	QC	Sim	SCR/%	时间/ms
Yelp Reviews	Textfooler	99.2	77.8	581.0	0.68	18.1	954.4
	TextHoaxer	99.2	78.0	800.3	0.73	9.3	1 364.4
	CLARE	99.2	78.7	1 391.6	0.78	10.6	3 268.6
	TAEGM	99.2	89.9	675.3	0.80	8.9	1 032.2
AG News	Textfooler	96.6	63.6	535.1	0.64	26.4	992.3
	TextHoaxer	96.6	77.4	1 100.2	0.77	7.9	1 342.5
	CLARE	96.6	82.1	2 834.7	0.71	8.5	4 031.5
	TAEGM	96.6	82.3	822.9	0.78	7.5	1 105.3
IMDB Review	Textfooler	96.1	77.6	584.3	0.74	15.4	833.5
	TextHoaxer	96.1	80.2	943.4	0.82	7.9	1 443.2
	CLARE	96.1	82.6	1 406.6	0.82	7.6	2 677.6
	TAEGM	96.1	81.8	624.4	0.88	7.6	994.7

生成对抗样本的模型	受攻击模型
生成对抗样本的模型	TEXTCNN1	TEXTCNN2	BERT
TEXTCNN1	98.0	68.7	65.3
TEXTCNN2	71.0	92.9	67.7
BERT	74.6	72.9	89.9

生成对抗样本的模型	受攻击模型
生成对抗样本的模型	TEXTCNN1	TEXTCNN2	BERT
TEXTCNN1	98.0	68.7	65.3
TEXTCNN2	71.0	92.9	67.7
BERT	74.6	72.9	89.9

数据集	训练集样本数	对抗样本数	ACC/%	SR/%
Yelp Reviews	124 000	2 500	98.0	53.7
AG News	124 000	2 500	94.7	51.0
IMDB Review	25 000	2 500	93.3	52.5

Text adversarial example generation method based on BERT model

基于BERT模型的文本对抗样本生成方法

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 7

References 23

Related Articles 15

Recommended Articles

Metrics

[1]	Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974.
[2]	Liting LI, Bei HUA, Ruozhou HE, Kuang XU. Multivariate time series prediction model based on decoupled attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2732-2738.
[3]	Zhiqiang ZHAO, Peihong MA, Xinhong HEI. Crowd counting method based on dual attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2886-2892.
[4]	Kaipeng XUE, Tao XU, Chunjie LIAO. Multimodal sentiment analysis network with self-supervision and multi-layer cross attention [J]. Journal of Computer Applications, 2024, 44(8): 2387-2392.
[5]	Pengqi GAO, Heming HUANG, Yonghong FAN. Fusion of coordinate and multi-head attention mechanisms for interactive speech emotion recognition [J]. Journal of Computer Applications, 2024, 44(8): 2400-2406.
[6]	Zhonghua LI, Yunqi BAI, Xuejin WANG, Leilei HUANG, Chujun LIN, Shiyu LIAO. Low illumination face detection based on image enhancement [J]. Journal of Computer Applications, 2024, 44(8): 2588-2594.
[7]	Shangbin MO, Wenjun WANG, Ling DONG, Shengxiang GAO, Zhengtao YU. Single-channel speech enhancement based on multi-channel information aggregation and collaborative decoding [J]. Journal of Computer Applications, 2024, 44(8): 2611-2617.
[8]	Li LIU, Haijin HOU, Anhong WANG, Tao ZHANG. Generative data hiding algorithm based on multi-scale attention [J]. Journal of Computer Applications, 2024, 44(7): 2102-2109.
[9]	Song XU, Wenbo ZHANG, Yifan WANG. Lightweight video salient object detection network based on spatiotemporal information [J]. Journal of Computer Applications, 2024, 44(7): 2192-2199.
[10]	Dahai LI, Zhonghua WANG, Zhendong WANG. Dual-branch low-light image enhancement network combining spatial and frequency domain information [J]. Journal of Computer Applications, 2024, 44(7): 2175-2182.
[11]	Wenliang WEI, Yangping WANG, Biao YUE, Anzheng WANG, Zhe ZHANG. Deep learning model for infrared and visible image fusion based on illumination weight allocation and attention [J]. Journal of Computer Applications, 2024, 44(7): 2183-2191.
[12]	Wu XIONG, Congjun CAO, Xuefang SONG, Yunlong SHAO, Xusheng WANG. Handwriting identification method based on multi-scale mixed domain attention mechanism [J]. Journal of Computer Applications, 2024, 44(7): 2225-2232.
[13]	Huanhuan LI, Tianqiang HUANG, Xuemei DING, Haifeng LUO, Liqing HUANG. Public traffic demand prediction based on multi-scale spatial-temporal graph convolutional network [J]. Journal of Computer Applications, 2024, 44(7): 2065-2072.
[14]	Dianhui MAO, Xuebo LI, Junling LIU, Denghui ZHANG, Wenjing YAN. Chinese entity and relation extraction model based on parallel heterogeneous graph and sequential attention mechanism [J]. Journal of Computer Applications, 2024, 44(7): 2018-2025.
[15]	Yan ZHOU, Yang LI. Rectified cross pseudo supervision method with attention mechanism for stroke lesion segmentation [J]. Journal of Computer Applications, 2024, 44(6): 1942-1948.