针对中文文本分类的多模态对抗样本生成方法

doi:10.11772/j.issn.1001-9081.2024091307

《计算机应用》唯一官方网站 ›› 2025, Vol. 45 ›› Issue (10): 3074-3082.DOI: 10.11772/j.issn.1001-9081.2024091307

• 人工智能 • 上一篇

针对中文文本分类的多模态对抗样本生成方法

王永平¹(), 刘垚², 张晓琳², 王静宇², 刘立新³

^1.内蒙古科技大学自动化与电气工程学院，内蒙古包头 014010
^2.内蒙古科技大学数智产业学院（网络安全学院），内蒙古包头 014010
^3.中国人民大学信息学院，北京 100872

收稿日期:2024-09-06 修回日期:2025-02-23 接受日期:2025-02-27 发布日期:2025-03-26 出版日期:2025-10-10
通讯作者: 王永平
作者简介:王永平（1984—），女，内蒙古赤峰人，讲师，硕士，主要研究方向：人工智能安全、大数据隐私保护
刘垚（1999—），女，河北唐山人，硕士研究生，主要研究方向：人工智能安全
张晓琳（1966—），女，内蒙古包头人，教授，博士，CCF会员，主要研究方向：人工智能安全、大数据隐私保护
王静宇（1976—），男，河南开封人，教授，博士，主要研究方向：大数据及安全、区块链及安全
刘立新（1983—），女，内蒙古通辽人，讲师，博士研究生，主要研究方向：数据安全、隐私保护、区块链、数据库。
基金资助:
国家自然科学基金资助项目(62466045);内蒙古自然科学基金资助项目(2023MS06012);内蒙古自治区直属高校科研业务费专项资金资助项目(2023RCTD027);内蒙古自治区直属高校科研业务费专项资金资助项目(2024QNJS047)

Multimodal adversarial example generation method for Chinese text classification

Yongping WANG¹(), Yao LIU², Xiaolin ZHANG², Jingyu WANG², Lixin LIU³

^1.School of Automation and Electrical Engineering，Inner Mongolia University of Science and Technology，Baotou Inner Mongolia 014010，China
^2.School of Digital Intelligent Industry （School of Cyber Science and Technology），Inner Mongolia University of Science and Technology，Baotou Inner Mongolia 014010，China
^3.School of Information，Renmin University of China，Beijing 100872，China

Received:2024-09-06 Revised:2025-02-23 Accepted:2025-02-27 Online:2025-03-26 Published:2025-10-10
Contact: Yongping WANG
About author:王永平（1984—），女，内蒙古赤峰人，讲师，硕士，主要研究方向：人工智能安全、大数据隐私保护 imust_wyp@163.com
LIU Yao， born in 1999， M. S. candidate. Her research interestsinclude artificial intelligence security
ZHANG Xiaolin，born in 1966， Ph. D.， professor. Her researchinterests include artificial intelligence security， big data privacy protection.
WANG Jingyu，born in 1976， Ph. D.， professor. His researchinterests include big data and security， blockchain and security.
LIU Lixin，born in 1983， Ph. D. candidate， lecturer. Her researchinterests include data security， privacy protection， blockchain， database.

摘要/Abstract

摘要：

针对现有中文文本对抗样本生成方法中重要词定位方法和变换策略单一，导致攻击成功率和对抗样本质量难以提高的问题，从汉字的形态、发音和语义角度，提出一种针对中文文本分类的多模态对抗样本生成方法。在计算词语重要性阶段，利用掩码模型和模型输出得到置信概率，并计算预测词的离散性且将它作为位置的敏感性，最终结合二者以确定扰动优先级；在对抗变换阶段，设计一种结合汉字的音形和语义特征的多模态攻击策略生成对抗样本，并通过词典、基于卷积神经网络（CNN）的字形相似比较模型和掩码语言模型（MLM）生成候选样本。实验结果表明，所提方法能对鲁棒性较强的BERT（Bidirectional Encoder Representations from Transformers）和RoBERTa（Robustly optimized BERT pretraining approach）模型实现了33.2%~65.8%的攻击成功率。可见，通过对抗训练生成的对抗样本可以提升模型的鲁棒性。

关键词: 深度学习, 文本分类, 对抗样本, 多模态, 对抗攻击

Abstract:

Aiming at the single important word localization method and transformation strategy in the existing Chinese text adversarial example generation methods， which leads to the problem that it is difficult to improve success rate of the attack and the quality of adversarial examples， a multimodal adversarial example generation method for Chinese text classification was proposed from the perspectives of morphology， pronunciation， and semantics of Chinese characters. In the stage of calculating word importance， the mask model and model output were used to obtain confidence probabilities， and discrete nature of the predicted word was calculated as the sensitivity of the position， and finally the two were combined to determine the perturbation priority. In the adversarial transformation stage， a multimodal attack strategy combining the phonological and semantic features of Chinese characters was designed to generate the adversarial examples， and the candidate examples were generated by the lexicon， the Convolutional Neural Network （CNN）-based character pattern similarity comparison model and the Masked Language Model （MLM）. Experimental results show that the proposed method can achieve 33.2%-65.8% attack success rate against robust BERT （Bidirectional Encoder Representations from Transformers） and RoBERTa （Robustly optimized BERT pretraining approach） models. It can be seen that the generated adversarial examples can improve the robustness of the model through adversarial training.

Key words: deep learning, text classification, adversarial example, multimodal, adversarial attack

中图分类号:

TP309

王永平, 刘垚, 张晓琳, 王静宇, 刘立新. 针对中文文本分类的多模态对抗样本生成方法[J]. 计算机应用, 2025, 45(10): 3074-3082.

Yongping WANG, Yao LIU, Xiaolin ZHANG, Jingyu WANG, Lixin LIU. Multimodal adversarial example generation method for Chinese text classification[J]. Journal of Computer Applications, 2025, 45(10): 3074-3082.

图/表 19

参考文献 27

[1]	ZHAO H， CHANG Y K， WANG W J. Research on robustness of deep neural networks based data preprocessing techniques［J］. International Journal of Network Security， 2022， 24（2）： 243-252.
[2]	GOYAL S， DODDAPANENI S， KHAPRA M M， et al. A survey of adversarial defenses and robustness in NLP［J］. ACM Computing Surveys， 2023， 55（14s）： No.332.
[3]	SZEGEDY C， ZAREMBA W， SUTSKEVER I， et al. Intriguing properties of neural networks［EB/OL］. ［2024-08-10］..
[4]	严莹子，王小平，庄葛巍，等. 基于深度强化学习的恶意软件混淆对抗样本生成［J］. 计算机应用与软件， 2022， 39（2）：315-323， 349.
	YAN Y Z， WANG X P， ZHUANG G W， et al. Obfuscated code adversarial sample generation method based on deep reinforcement learning［J］. Computer Applications and Software， 2022， 39（2）：315-323， 349.
[5]	GOODFELLOW I J， SHLENS J， SZEGEDY C. Explaining and harnessing adversarial examples［EB/OL］. ［2023-08-10］..
[6]	PAPERNOT N， McDANIEL P， JHA S， et al. The limitations of deep learning in adversarial settings［C］// Proceedings of the 2016 1st IEEE European Symposium on Security and Privacy. Piscataway： IEEE， 2016： 372-387.
[7]	CARLINI N， WAGNER D. Towards evaluating the robustness of neural networks［C］// Proceedings of the 2017 IEEE Symposium on Security and Privacy. Piscataway： IEEE， 2017： 39-57.
[8]	MOOSAVI-DEZFOOLI S M， FAWZI A， FROSSARD P. DeepFool： a simple and accurate method to fool deep neural networks［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 2574-2582.
[9]	李宇航，杨玉丽，马垚，等. 基于BERT模型的文本对抗样本生成方法［J］. 计算机应用， 2023， 43（10）：3093-3098.
	LI Y H， YANG Y L， MA Y， et al. Text adversarial example generation method based on BERT model［J］. Journal of Computer Applications， 2023， 43（10）： 3093-3098.
[10]	XING X， JIN Z， JIN D， et al. Tasty burgers， soggy fries： probing aspect robustness in aspect-based sentiment analysis［C］// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Stroudsburg： ACL， 2020： 3594-3605.
[11]	GAO J， LANCHANTIN J， SOFFA M L， et al. Black-box generation of adversarial text sequences to evade deep learning classifiers［C］// Proceedings of the 2018 IEEE Symposium on Security and Privacy Workshops. Piscataway： IEEE， 2018： 50-56.
[12]	JIN D， JIN Z， ZHOU J T， et al. Is BERT really robust？ a strong baseline for natural language attack on text classification and entailment［C］// Proceedings of the 34th AAAI Conference on Artificial Intelligence. Palo Alto： AAAI Press， 2020： 8018-8025.
[13]	ZANG Y， QI F， YANG C， et al. Word-level textual adversarial attacking as combinatorial optimization［EB/OL］. ［2024-08-05］..
[14]	LEE D， MOON S， LEE J， et al. Query-efficient and scalable black-box adversarial attacks on discrete sequential data via Bayesian optimization［C］// Proceedings of the 39th International Conference on Machine Learning. New York： JMLR.org， 2022： 12478-12497.
[15]	王文琦，汪润，王丽娜，等. 面向中文文本倾向性分类的对抗样本生成方法［J］. 软件学报， 2019， 30（8）：2415-2427.
	WANG W Q， WANG R， WANG L N， et al. Adversarial examples generation approach for tendency classification on Chinese texts［J］. Journal of Software， 2019， 30（8）： 2415-2427.
[16]	仝鑫，王罗娜，王润正，等. 面向中文文本分类的词级对抗样本生成方法［J］. 信息网络安全， 2020， 20（9）：12-16.
	TONG X， WANG L N， WANG R Z， et al. A generation method of word-level adversarial samples for Chinese text classification［J］. Netinfo Security， 2020， 20（9）： 12-16.
[17]	NUO C， CHANG G Q， GAO H， et al. WordChange： adversarial examples generation approach for Chinese text classification［J］. IEEE Access， 2020， 8： 79561-79572.
[18]	LIU H， CAI C， QI Y. Expanding scope： adapting English adversarial attacks to Chinese［C］// Proceedings of the 3rd Workshop on Trustworthy Natural Language Processing. Stroudsburg： ACL， 2023：276-286.
[19]	李相葛，罗红，孙岩. 基于汉语特征的中文对抗样本生成方法［J］. 软件学报， 2023， 34（11）： 5143-5161.
	LI X G， LUO H， SUN Y. Adversarial sample generation method based on Chinese features［J］. Journal of Software， 2023， 34（11）： 5143-5161.
[20]	GE Z， HU H， ZHAO T， et al. Reading is not believing： a multimodal adversarial attacker for Chinese-NLP model［J］. Computers and Security， 2023， 125： No.103052.
[21]	HARBECKE D， ALT C. Considering likelihood in NLP classification explanations with occlusion and language modeling［C］// Proceedings of the 58th Annual Meeting of the Association for Computation Linguistics： Student Research Workshop. Stroudsburg： ACL， 2020： 111-117.
[22]	QI F， YANG C， LIU Z， et al. OpenHowNet： an open sememe-based lexical knowledge base［EB/OL］. ［2024-07-20］..
[23]	DEVLIN J， CHANG M W， LEE K， et al. BERT： pre-training of deep bidirectional Transformers for language understanding［C］// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies， Volume 1 （Long and Short Papers）. Stroudsburg： ACL， 2019： 4171-4186.
[24]	KINGMA D P， BA J L. Adam： a method for stochastic optimization［EB/OL］. ［2024-07-20］..
[25]	KUSNER M J， SUN Y， KOLKIN N I， et al. From word embeddings to document distances［C］// Proceedings of the 32nd International Conference on Machine Learning. New York： JMLR.org， 2015： 957-966.
[26]	VADILLO J， SANTANA R. On the human evaluation of universal audio adversarial perturbations［J］. Computers and Security， 2022， 112： No.102495.
[27]	宋逸飞，柳毅. 基于数据增强和标签噪声的快速对抗训练方法［J］. 计算机应用， 2024， 44（12）： 3798-3807.SONG Y F， LIU Y. Fast adversarial training method based on data augmentation and label noise［J］. Journal of Computer Applications， 2024， 44（12）： 3798-3807.This work is partially supported by National Natural Science Foundation of China （62466045）； Inner Mongolia Natural Science Foundation （2023 MS 06012）； Fundamental Research Funds for the Colleges and Universities directly under the Inner Mongolia Autonomous Region （2023RCTD027， 2024QNJS047）.WANG Yongping， in born 1984， M. S.， lecturer. Her research interests include artificial intelligence security， big data privacy protection.LIU Yao， in born 1999， M. S. candidate. Her research interests include artificial intelligence security.ZHANG Xiaolin， in born 1966， Ph. D.， professor. Her research interests include artificial intelligence security， big data privacy protection.WANG Jingyu， in born 1976， Ph. D.， professor. His research interests include big data and security， blockchain and security.LIU Lixin， in born 1983， Ph. candidateD.， lecturer. Her research interests include data security， protectionprivacy， blockchain， database.

样本类型	样本	标签	概率/%
原始样本1	屏幕上有个色点，可能人品好吧，在上方边框哪里，不影响使用。装GHOST系统很麻烦的，不太懂电脑的人要费力了。建议DM格式化后再装系统吧，不然会有个BUG出来，进不去	0/消极	83
对抗样本1	屏幕上有个色点，可能人品好吧，在上方边框哪里，不影响使用。装GHOST系统很麻烦的，不太懂电脑的人要费力了。建议DM格式化后再装系统吧，不然会有个BUG出来，进步u去	1/积极	61
原始样本2	香港确诊第四例甲型流感病例	5/时政	100
对抗样本2	香缸ang确珍第四例甲型流感病例	9/科技	80

样本类型	样本	标签	概率/%
原始样本1	屏幕上有个色点，可能人品好吧，在上方边框哪里，不影响使用。装GHOST系统很麻烦的，不太懂电脑的人要费力了。建议DM格式化后再装系统吧，不然会有个BUG出来，进不去	0/消极	83
对抗样本1	屏幕上有个色点，可能人品好吧，在上方边框哪里，不影响使用。装GHOST系统很麻烦的，不太懂电脑的人要费力了。建议DM格式化后再装系统吧，不然会有个BUG出来，进步u去	1/积极	61
原始样本2	香港确诊第四例甲型流感病例	5/时政	100
对抗样本2	香缸ang确珍第四例甲型流感病例	9/科技	80

原始样本	对抗样本
系统很麻烦	系通ong很麻烦
不合理	步u合理
交通方便	交通放ang便

原始样本	对抗样本
系统很麻烦	系通ong很麻烦
不合理	步u合理
交通方便	交通放ang便

数据集	类别数	样本数			平均词数
数据集	类别数	训练集	测试集	验证集	平均词数
线上购物评论	2	62 000	6 000	6 000	30.18
酒店评论	2	8 000	1 000	1 000	69.76
垃圾短信	2	7 000	1 000	1 000	6.25
THUNews	10	50 000	5 000	5 000	9.33

针对中文文本分类的多模态对抗样本生成方法

Multimodal adversarial example generation method for Chinese text classification

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 19

参考文献 27

相关文章 15

编辑推荐

Metrics

攻击方法	线上购物评论			酒店评论			垃圾短信			THUNews
攻击方法	攻击成功率/%	困惑度	扰动率/%	攻击成功率/%	困惑度	扰动率/%	攻击成功率/%	困惑度	扰动率/%	攻击成功率/%	困惑度	扰动率/%
CWordAttacker	32.7	29.6	17.4	36.2	28.7	15.4	17.5	42.0	19.3	27.2	46.8	19.0
Liu-Composite	59.1	31.6	17.5	62.3	29.4	15.2	29.8	43.5	18.9	39.4	47.1	18.8
ZH-Deceiver	62.4	32.4	16.4	64.3	30.2	13.8	32.3	43.8	18.4	42.5	47.7	18.1
CMAttack	63.7	32.9	16.6	65.8	30.6	15.0	34.6	44.5	18.0	44.2	48.3	17.9

样本类别	线上购物评论				酒店评论				垃圾短信					THUNews
	分类准确率/%			流畅性	分类准确率/%			流畅性	分类准确率/%			流畅性	分类准确率/%			流畅性
	BERT	RoBERTa	人工	流畅性	BERT	RoBERTa	人工	流畅性	BERT	RoBERTa	人工	流畅性	BERT	RoBERTa	人工	流畅性
原始	93.7	94.2	95.2	4.7	94.5	95.0	97.0	4.8	97.0	97.4	98.7	4.1	93.4	93.6	95.0	4.5
对抗	34.0	38.1	94.0	4.2	32.3	36.6	94.6	4.0	63.4	65.1	96.7	3.8	52.2	53.6	92.8	3.9

方法	线上购物评论		酒店评论		垃圾短信		THUNews
方法	攻击成功率	扰动率	攻击成功率	扰动率	攻击成功率	扰动率	攻击成功率	扰动率
DS方法	60.4	17.0	64.0	16.1	32.5	18.3	40.7	18.6
本文方法	63.7	16.6	65.8	15.0	34.6	18.0	44.2	17.9

策略	线上购物评论		酒店评论		垃圾短信		THUNews
策略	攻击成功率	扰动率	攻击成功率	扰动率	攻击成功率	扰动率	攻击成功率	扰动率
同音字替换	34.1	17.5	41.0	16.1	13.0	19.7	17.4	19.8
同音字韵母	35.3	14.5	41.8	11.3	14.6	17.5	18.7	17.6

模型	分类准确率		攻击成功率		扰动率
模型	AT前	AT后	AT前	AT后	AT前	AT后
BERT	93.7	95.0	63.7	46.5	16.6	17.9
RoBERTa	94.2	95.4	59.6	42.7	17.7	19.1

[1]	黄锦阳, 崔丰麒, 马长秀, 樊文东, 李萌, 李经宇, 孙晓, 黄林生, 刘志. 基于通用手环的睡眠呼吸暂停检测[J]. 《计算机应用》唯一官方网站, 2025, 45(9): 3045-3056.
[2]	景攀峰, 梁宇栋, 李超伟, 郭俊茹, 郭晋育. 基于师生学习的半监督图像去雾算法[J]. 《计算机应用》唯一官方网站, 2025, 45(9): 2975-2983.
[3]	张宏俊, 潘高军, 叶昊, 陆玉彬, 缪宜恒. 结合深度学习和张量分解的多源异构数据分析方法[J]. 《计算机应用》唯一官方网站, 2025, 45(9): 2838-2847.
[4]	李进, 刘立群. 基于残差Swin Transformer的SAR与可见光图像融合[J]. 《计算机应用》唯一官方网站, 2025, 45(9): 2949-2956.
[5]	殷兵, 凌震华, 林垠, 奚昌凤, 刘颖. 兼容缺失模态推理的情感识别方法[J]. 《计算机应用》唯一官方网站, 2025, 45(9): 2764-2772.
[6]	李维刚, 邵佳乐, 田志强. 基于双注意力机制和多尺度融合的点云分类与分割网络[J]. 《计算机应用》唯一官方网站, 2025, 45(9): 3003-3010.
[7]	许志雄, 李波, 边小勇, 胡其仁. 对抗样本嵌入注意力U型网络的3D医学图像分割[J]. 《计算机应用》唯一官方网站, 2025, 45(9): 3011-3016.
[8]	王祉苑, 彭涛, 杨捷. 分布外检测中训练与测试的内外数据整合[J]. 《计算机应用》唯一官方网站, 2025, 45(8): 2497-2506.
[9]	张硕, 孙国凯, 庄园, 冯小雨, 王敬之. 面向区块链节点分析的eclipse攻击动态检测方法[J]. 《计算机应用》唯一官方网站, 2025, 45(8): 2428-2436.
[10]	葛丽娜, 王明禹, 田蕾. 联邦学习的高效性研究综述[J]. 《计算机应用》唯一官方网站, 2025, 45(8): 2387-2398.
[11]	彭鹏, 蔡子婷, 刘雯玲, 陈才华, 曾维, 黄宝来. 基于CNN和双向GRU混合孪生网络的语音情感识别方法[J]. 《计算机应用》唯一官方网站, 2025, 45(8): 2515-2521.
[12]	廖炎华, 鄢元霞, 潘文林. 基于YOLOv9的交通路口图像的多目标检测算法[J]. 《计算机应用》唯一官方网站, 2025, 45(8): 2555-2565.
[13]	王艺涵, 路翀, 陈忠源. 跨模态文本信息增强的多模态情感分析模型[J]. 《计算机应用》唯一官方网站, 2025, 45(7): 2237-2244.
[14]	索晋贤, 张丽萍, 闫盛, 王东奇, 张雅雯. 可解释的深度知识追踪方法综述[J]. 《计算机应用》唯一官方网站, 2025, 45(7): 2043-2055.
[15]	王震洲, 郭方方, 宿景芳, 苏鹤, 王建超. 面向智能巡检的视觉模型鲁棒性优化方法[J]. 《计算机应用》唯一官方网站, 2025, 45(7): 2361-2368.

攻击方法	线上购物评论			酒店评论			垃圾短信			THUNews
攻击方法	攻击成功率/%	困惑度	扰动率/%	攻击成功率/%	困惑度	扰动率/%	攻击成功率/%	困惑度	扰动率/%	攻击成功率/%	困惑度	扰动率/%
CWordAttacker	30.4	29.3	18.6	32.7	27.6	17.2	16.0	39.9	19.6	25.8	42.3	19.7
Liu-Composite	56.3	31.0	18.5	59.2	28.5	16.8	29.4	41.5	18.9	37.5	43.5	19.3
ZH-Deceiver	58.2	31.5	17.2	60.8	29.4	15.1	30.7	41.6	18.7	40.0	43.9	18.7
CMAttack	59.6	32.2	17.7	62.3	29.6	16.2	33.2	43.1	18.2	42.7	45.8	18.3

攻击方法	线上购物评论			酒店评论			垃圾短信			THUNews
攻击方法	攻击成功率/%	困惑度	扰动率/%	攻击成功率/%	困惑度	扰动率/%	攻击成功率/%	困惑度	扰动率/%	攻击成功率/%	困惑度	扰动率/%
CWordAttacker	30.4	29.3	18.6	32.7	27.6	17.2	16.0	39.9	19.6	25.8	42.3	19.7
Liu-Composite	56.3	31.0	18.5	59.2	28.5	16.8	29.4	41.5	18.9	37.5	43.5	19.3
ZH-Deceiver	58.2	31.5	17.2	60.8	29.4	15.1	30.7	41.6	18.7	40.0	43.9	18.7
CMAttack	59.6	32.2	17.7	62.3	29.6	16.2	33.2	43.1	18.2	42.7	45.8	18.3