《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (10): 3093-3098.DOI: 10.11772/j.issn.1001-9081.2022091468
所属专题: 人工智能
收稿日期:
2022-10-08
修回日期:
2023-02-19
接受日期:
2023-02-23
发布日期:
2023-04-17
出版日期:
2023-10-10
通讯作者:
陈永乐
作者简介:
李宇航(1998—),男,山西临汾人,硕士研究生,CCF会员,主要研究方向:人工智能基金资助:
Yuhang LI, Yuli YANG, Yao MA, Dan YU, Yongle CHEN()
Received:
2022-10-08
Revised:
2023-02-19
Accepted:
2023-02-23
Online:
2023-04-17
Published:
2023-10-10
Contact:
Yongle CHEN
About author:
LI Yuhang, born in 1998, M. S. candidate. His research interests include artificial intelligence.Supported by:
摘要:
针对现有对抗样本生成方法需要大量访问目标模型,导致攻击效果较差的问题,提出了基于BERT (Bidirectional Encoder Representations from Transformers)模型的文本对抗样本生成方法(TAEGM)。首先采用注意力机制,在不访问目标模型的情况下,定位显著影响分类结果的关键单词;其次通过BERT模型对关键单词进行单词级扰动,从而生成候选样本;最后对候选样本进行聚类,并从对分类结果影响更大的簇中选择对抗样本。在Yelp Reviews、AG News和IMDB Review数据集上的实验结果表明,相较于攻击成功率(SR)次优的对抗样本生成方法CLARE(ContextuaLized AdversaRial Example generation model),TAEGM在保证对抗攻击SR的前提下,对目标模型的访问次数(QC)平均减少了62.3%,时间平均减少了68.6%。在此基础之上,进一步的实验结果验证了TAEGM生成的对抗样本不仅具有很好的迁移性,还可以通过对抗训练提升模型的鲁棒性。
中图分类号:
李宇航, 杨玉丽, 马垚, 于丹, 陈永乐. 基于BERT模型的文本对抗样本生成方法[J]. 计算机应用, 2023, 43(10): 3093-3098.
Yuhang LI, Yuli YANG, Yao MA, Dan YU, Yongle CHEN. Text adversarial example generation method based on BERT model[J]. Journal of Computer Applications, 2023, 43(10): 3093-3098.
数据集 | 标签数 | 训练集 样本数 | 测试集 样本数 | 平均 单词数 | 任务 |
---|---|---|---|---|---|
Yelp Reviews | 2 | 522 000 | 38 000 | 623.3 | 情感分类 |
AG News | 4 | 124 000 | 7 600 | 278.6 | 新闻分类 |
IMDB Review | 2 | 25 000 | 2 500 | 325.6 | 情感分类 |
表1 三个数据集的详细信息
Tab. 1 Details of three datasets
数据集 | 标签数 | 训练集 样本数 | 测试集 样本数 | 平均 单词数 | 任务 |
---|---|---|---|---|---|
Yelp Reviews | 2 | 522 000 | 38 000 | 623.3 | 情感分类 |
AG News | 4 | 124 000 | 7 600 | 278.6 | 新闻分类 |
IMDB Review | 2 | 25 000 | 2 500 | 325.6 | 情感分类 |
数据集 | 方法 | ACC/% | SR/% | QC | Sim | SCR/% | 时间/ms |
---|---|---|---|---|---|---|---|
Yelp Reviews | Textfooler | 99.2 | 77.8 | 581.0 | 0.68 | 18.1 | 954.4 |
TextHoaxer | 99.2 | 78.0 | 800.3 | 0.73 | 1 364.4 | ||
CLARE | 1 391.6 | 10.6 | 3 268.6 | ||||
TAEGM | 99.2 | 89.9 | 0.80 | 8.9 | |||
AG News | Textfooler | 96.6 | 63.6 | 535.1 | 0.64 | 26.4 | 992.3 |
TextHoaxer | 96.6 | 77.4 | 1 100.2 | 1 342.5 | |||
CLARE | 96.6 | 2 834.7 | 0.71 | 8.5 | 4 031.5 | ||
TAEGM | 96.6 | 82.3 | 0.78 | 7.5 | |||
IMDB Review | Textfooler | 96.1 | 77.6 | 584.3 | 0.74 | 15.4 | 833.5 |
TextHoaxer | 96.1 | 80.2 | 943.4 | 1 443.2 | |||
CLARE | 96.1 | 82.6 | 1 406.6 | 7.6 | 2 677.6 | ||
TAEGM | 96.1 | 0.88 | 7.6 |
表2 四种方法在3个数据集上进行对抗攻击的性能比较
Tab. 2 Performance comparison of four methods performing adversarial attacks on three datasets
数据集 | 方法 | ACC/% | SR/% | QC | Sim | SCR/% | 时间/ms |
---|---|---|---|---|---|---|---|
Yelp Reviews | Textfooler | 99.2 | 77.8 | 581.0 | 0.68 | 18.1 | 954.4 |
TextHoaxer | 99.2 | 78.0 | 800.3 | 0.73 | 1 364.4 | ||
CLARE | 1 391.6 | 10.6 | 3 268.6 | ||||
TAEGM | 99.2 | 89.9 | 0.80 | 8.9 | |||
AG News | Textfooler | 96.6 | 63.6 | 535.1 | 0.64 | 26.4 | 992.3 |
TextHoaxer | 96.6 | 77.4 | 1 100.2 | 1 342.5 | |||
CLARE | 96.6 | 2 834.7 | 0.71 | 8.5 | 4 031.5 | ||
TAEGM | 96.6 | 82.3 | 0.78 | 7.5 | |||
IMDB Review | Textfooler | 96.1 | 77.6 | 584.3 | 0.74 | 15.4 | 833.5 |
TextHoaxer | 96.1 | 80.2 | 943.4 | 1 443.2 | |||
CLARE | 96.1 | 82.6 | 1 406.6 | 7.6 | 2 677.6 | ||
TAEGM | 96.1 | 0.88 | 7.6 |
标签 | 文本 |
---|---|
Negative→Positive | Stay away from the sirloin dishes. People (I)【BERT_Replace】 don’t know what the heck is in this—usually【BERT_Insert】 it tastes like compacted beef, shred up and repackaged to look like a steak…. lolI(not)【BERT_Replace】 I’d even say the sirloin was nuked after reading these reviews. :)(Disgusting!)【BERT_Merge】 |
Negative→Positive | Ok, I know it’s Vegas and everything is expensive, but oh【BERT_Insert】 these were no(just mediocre)【BERT_Merge】 over priced deli sandwiches and small soggy potato pancakes. However, as in most casino spots, the staff trips above (over) 【BERT_Replace】themselves to make sure that you have everything that you need and that you aren’t waiting for good service. |
表3 在BERT上利用TAEGM生成的对抗样本展示
Tab. 3 Display of adversarial examples generated by TAEGM on BERT
标签 | 文本 |
---|---|
Negative→Positive | Stay away from the sirloin dishes. People (I)【BERT_Replace】 don’t know what the heck is in this—usually【BERT_Insert】 it tastes like compacted beef, shred up and repackaged to look like a steak…. lolI(not)【BERT_Replace】 I’d even say the sirloin was nuked after reading these reviews. :)(Disgusting!)【BERT_Merge】 |
Negative→Positive | Ok, I know it’s Vegas and everything is expensive, but oh【BERT_Insert】 these were no(just mediocre)【BERT_Merge】 over priced deli sandwiches and small soggy potato pancakes. However, as in most casino spots, the staff trips above (over) 【BERT_Replace】themselves to make sure that you have everything that you need and that you aren’t waiting for good service. |
生成对抗样本的 模型 | 受攻击模型 | ||
---|---|---|---|
TEXTCNN1 | TEXTCNN2 | BERT | |
TEXTCNN1 | 98.0 | 68.7 | 65.3 |
TEXTCNN2 | 71.0 | 92.9 | 67.7 |
BERT | 74.6 | 72.9 | 89.9 |
表4 在Yelp Reviews数据集上的迁移攻击成功率 (%)
Tab. 4 Success rate of transferable attacks on Yelp Reviews dataset
生成对抗样本的 模型 | 受攻击模型 | ||
---|---|---|---|
TEXTCNN1 | TEXTCNN2 | BERT | |
TEXTCNN1 | 98.0 | 68.7 | 65.3 |
TEXTCNN2 | 71.0 | 92.9 | 67.7 |
BERT | 74.6 | 72.9 | 89.9 |
数据集 | 训练集样本数 | 对抗样本数 | ACC/% | SR/% |
---|---|---|---|---|
Yelp Reviews | 124 000 | 2 500 | 98.0 | 53.7 |
AG News | 124 000 | 2 500 | 94.7 | 51.0 |
IMDB Review | 25 000 | 2 500 | 93.3 | 52.5 |
表5 TAEGM在3个数据集上对抗训练的结果
Tab. 5 Adversarial training results of TAEGM on three datasets
数据集 | 训练集样本数 | 对抗样本数 | ACC/% | SR/% |
---|---|---|---|---|
Yelp Reviews | 124 000 | 2 500 | 98.0 | 53.7 |
AG News | 124 000 | 2 500 | 94.7 | 51.0 |
IMDB Review | 25 000 | 2 500 | 93.3 | 52.5 |
1 | PAPERNOT N, McDANIEL P, SWAMI A, et al. Crafting adversarial input sequences for recurrent neural networks[C]// Proceedings of the 2016 IEEE Military Communications Conference. Piscataway: IEEE, 2016: 49-54. 10.1109/milcom.2016.7795300 |
2 | SAMANGOUEI P, KABKAB M, CHELLAPPA R, et al. Defense-GAN: protecting classifiers against adversarial attacks using generative models[EB/OL]. (2018-05-18) [2022-07-13].. |
3 | 潘文雯,王新宇,宋明黎,等. 对抗样本生成技术综述[J]. 软件学报, 2020, 31(1):67-81. |
PAN W W, WANG X Y, SONG M L, et al. Survey on generating adversarial examples[J]. Journal of Software, 2020, 31(1): 67-81. | |
4 | 王文琦,汪润,王丽娜,等. 面向中文文本倾向性分类的对抗样本生成方法[J]. 软件学报, 2019, 30(8):2415-2427. |
WANG W Q, WANG R, WANG L N, et al. Adversarial examples generation approach for tendency classification on Chinese texts[J]. Journal of Software, 2019, 30(8): 2415-2427. | |
5 | LI J, JI S, DU T, et al. TextBugger: generating adversarial text against real-world applications[C]// Proceedings of the 26th Annual Network and Distributed System Security Symposium. Reston, VA: Internet Society, 2019: No.23138. 10.14722/ndss.2019.23138 |
6 | SONG L, YU X, PENG H T, et al. Universal adversarial attacks with natural triggers for text classification[C]// Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, PA: ACL, 2021: 3724-3733. 10.18653/v1/2021.naacl-main.291 |
7 | MAHESHWARY R, MAHESHWARY S, PUDI V. A strong baseline for query efficient attacks in a black box setting[C]// Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: ACL, 2021: 8396-8409. 10.18653/v1/2021.emnlp-main.661 |
8 | DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional Transformers for language understanding[C]// Proceedings of the2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Stroudsburg, PA: ACL, 2019: 4171-4186. 10.18653/v1/n18-2 |
9 | KULESHOV V, THAKOOR S, LAU T, et al. Adversarial examples for natural language classification problems[EB/OL]. [2022-07-13].. |
10 | ALZANTOT M, SHARMA Y, ELGOHARY A, et al. Generating natural language adversarial examples[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: ACL, 2018: 2890-2896. 10.18653/v1/d18-1316 |
11 | REN S, DENG Y, HE K, et al. Generating natural language adversarial examples through probability weighted word saliency[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: ACL, 2019: 1085-1097. 10.18653/v1/p19-1103 |
12 | GARG S, RAMAKRISHNAN G. BAE: BERT-based adversarial examples for text classification[C]// Proceedings of the 2020 Conference on Empirical Methods in Natural. Stroudsburg, PA: ACL, 2020: 6174-6181. 10.18653/v1/2020.emnlp-main.498 |
13 | 仝鑫,王罗娜,王润正,等. 面向中文文本分类的词级对抗样本生成方法[J]. 信息网络安全, 2020, 20(9):12-16. 10.3969/j.issn.1671-1122.2020.09.003 |
TONG X, WANG L N, WANG R Z, et al. A generation method of word-level adversarial samples for Chinese text classification[J]. Netinfo Security, 2020, 20(9):12-16. 10.3969/j.issn.1671-1122.2020.09.003 | |
14 | MAHESHWARY R, MAHESHWARY S, PUDI V. Generating natural language attacks in a hard label black box setting[C]// Proceedings of the 35th AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2021: 13525-13533. 10.1609/aaai.v35i15.17595 |
15 | LI L, MA R, GUO Q, et al. BERT-ATTACK: adversarial attack against BERT using BERT[C]// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: ACL, 2020: 6193-6202. 10.18653/v1/2020.emnlp-main.500 |
16 | MA X, ZHOU C, LI X, et al. FlowSeq: non-autoregressive conditional sequence generation with generative flow[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Stroudsburg, PA: ACL, 2019: 4282-4292. 10.18653/v1/d19-1437 |
17 | LIU Y, OTT M, GOYAL N, et al. RoBERTa: a robustly optimized BERT pretraining approach[EB/OL]. (2019-07-26) [2022-07-13].. |
18 | CER D, YANG Y, KONG S Y, et al. Universal sentence encoder for English[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Stroudsburg, PA: ACL, 2018: 169-174. 10.18653/v1/d18-2029 |
19 | ZHANG X, ZHAO J, LeCUN Y. Character-level convolutional networks for text classification[C]// Proceedings of the 28th International Conference on Neural Information Processing Systems — Volume 1. Cambridge: MIT Press, 2015:649-657. |
20 | MAAS A L, DALY R E, PHAM P T, et al. Learning word vectors for sentiment analysis[C]// Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, PA: ACL, 2011:142-150. |
21 | JIN D, JIN Z, ZHOU J T, et al. Is BERT really robust? natural language attack on text classification and entailment[C]// Proceedings of the 34th AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2020: 8018-8025. 10.1609/aaai.v34i05.6311 |
22 | YE M, MIAO C, WANG T, et al. TextHoaxer: budgeted hard-label adversarial attacks on text[C]// Proceedings of the 36th AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2022: 3877-3884. 10.1609/aaai.v36i4.20303 |
23 | LI D, ZHANG Y, PENG H, et al. Contextualized perturbation for textual adversarial attack[C]// Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, PA: ACL, 2020: 5053-5069. 10.18653/v1/2021.naacl-main.400 |
[1] | 赵志强, 马培红, 黑新宏. 基于双重注意力机制的人群计数方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2886-2892. |
[2] | 秦璟, 秦志光, 李发礼, 彭悦恒. 基于概率稀疏自注意力神经网络的重性抑郁疾患诊断[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2970-2974. |
[3] | 李力铤, 华蓓, 贺若舟, 徐况. 基于解耦注意力机制的多变量时序预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2732-2738. |
[4] | 薛凯鹏, 徐涛, 廖春节. 融合自监督和多层交叉注意力的多模态情感分析网络[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2387-2392. |
[5] | 汪雨晴, 朱广丽, 段文杰, 李书羽, 周若彤. 基于交互注意力机制的心理咨询文本情感分类模型[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2393-2399. |
[6] | 高鹏淇, 黄鹤鸣, 樊永红. 融合坐标与多头注意力机制的交互语音情感识别[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2400-2406. |
[7] | 石锐, 李勇, 朱延晗. 基于特征梯度均值化的调制信号对抗样本攻击算法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2521-2527. |
[8] | 李钟华, 白云起, 王雪津, 黄雷雷, 林初俊, 廖诗宇. 基于图像增强的低照度人脸检测[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2588-2594. |
[9] | 莫尚斌, 王文君, 董凌, 高盛祥, 余正涛. 基于多路信息聚合协同解码的单通道语音增强[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2611-2617. |
[10] | 熊武, 曹从军, 宋雪芳, 邵云龙, 王旭升. 基于多尺度混合域注意力机制的笔迹鉴别方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2225-2232. |
[11] | 李欢欢, 黄添强, 丁雪梅, 罗海峰, 黄丽清. 基于多尺度时空图卷积网络的交通出行需求预测[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2065-2072. |
[12] | 毛典辉, 李学博, 刘峻岭, 张登辉, 颜文婧. 基于并行异构图和序列注意力机制的中文实体关系抽取模型[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2018-2025. |
[13] | 刘丽, 侯海金, 王安红, 张涛. 基于多尺度注意力的生成式信息隐藏算法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2102-2109. |
[14] | 徐松, 张文博, 王一帆. 基于时空信息的轻量视频显著性目标检测网络[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2192-2199. |
[15] | 李大海, 王忠华, 王振东. 结合空间域和频域信息的双分支低光照图像增强网络[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2175-2182. |
阅读次数 | ||||||
全文 287
|
|
|||||
摘要 426
|
|
|||||