DKP： defending against model stealing attacks based on dark knowledge protection

doi:10.11772/j.issn.1001-9081.2023071056

Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (7): 2080-2086.DOI: 10.11772/j.issn.1001-9081.2023071056

• Cyber security • Previous Articles Next Articles

DKP： defending against model stealing attacks based on dark knowledge protection

Zhi ZHANG, Xin LI(), Naifu YE, Kaixi HU

Academy of Information Network Security，People's Public Security University of China，Beijing 100038，China

Received:2023-08-04 Revised:2023-10-01 Accepted:2023-10-10 Online:2023-10-26 Published:2024-07-10
Contact: Xin LI
About author:ZHANG Zhi， born in 1999， M. S. candidate. His research interests include cyberspace security.
YE Naifu， born in 1999， M. S. candidate. His research interests include cyberspace security， natural language processing.
HU Kaixi， born in 2000， M. S. candidate. Her research interests include cyberspace security.
First author contact:LI Xin， born in 1977， Ph. D.， professor. His research interests include cloud computing， network security.
Supported by:
National Key Research and Development Program of China(2020 AAA0107705)

基于暗知识保护的模型窃取防御技术DKP

张郅, 李欣(), 叶乃夫, 胡凯茜

中国人民公安大学信息网络安全学院，北京 100038

通讯作者: 李欣
作者简介:张郅（1999—），男，山西吕梁人，硕士研究生，主要研究方向：网络空间安全；
叶乃夫（1999—），男，山东济南人，硕士研究生，主要研究方向：网络空间安全、自然语言处理；
胡凯茜（2000—），女，河南平顶山人，硕士研究生，主要研究方向：网络空间安全。
第一联系人：李欣（1977—），男，江西赣州人，教授，博士，CCF会员，主要研究方向：云计算、网络安全；
基金资助:
国家重点研发计划项目(2020 AAA0107705)

Abstract

Abstract:

In black-box scenarios， using model function stealing methods to generate piracy models has posed a serious threat to the security and intellectual property protection of models in the cloud. To solve the problem of existing model stealing defense techniques， such as perturbation and softening labels （variable temperature）， may cause the category with the maximum confidence value in the model output to change， thereby affecting the performance of the model in the original task， a model stealing defense method based on dark knowledge protection was proposed which was called DKP （defending against model stealing attacks based on Dark Knowledge Protection）. First， the cloud model to be protected was used to process the test samples， obtaining its initial confidence distribution vector. Then， a dark knowledge protection layer was added after model output layer， and the initial confidence distribution vector was perturbed through the partitioned temperature-regulated softmax mechanism. Finally， the defended confidence distribution vector was obtained， thus reducing the risk of model information leakage. The proposed method achieved significant defensive effects on four public datasets； especially on the blog dataset， the accuracy of the piracy model was reduced by 17.4 percentage points， while the method of noise perturbation of the posterior probability only reduced the accuracy of the piracy model by about 2 percentage points. The experimental results show that the proposed method solves the problems of existing active defense methods such as perturbation and softening labels， successfully reduces the accuracy of the piracy model by perturbing the category probability distribution features of the cloud model output without affecting the classification results， and achieves a reliable guarantee of the confidentiality of cloud model.

Key words: deep learning, black-box scenarios, cloud-based model, model function theft, model stealing defense, dark knowledge protection

摘要：

在黑盒场景下，使用模型功能窃取方法生成盗版模型已经对云端模型的安全性和知识产权保护构成严重威胁。针对扰动和软化标签（变温）等现有的模型窃取防御技术可能导致模型输出中置信度最大值的类别发生改变，进而影响原始任务中模型性能的问题，提出一种基于暗知识保护的模型功能窃取防御方法，称为DKP（defending against model stealing attacks based on Dark Knowledge Protection）。首先，利用待保护的云端模型对测试样本进行处理，以获得样本的初始置信度分布向量；然后，在模型输出层之后添加暗知识保护层，通过分区变温调节softmax机制对初始置信度分布向量进行扰动处理；最后，得到经过防御的置信度分布向量，从而降低模型信息泄露的风险。使用所提方法在4个公开数据集上取得了显著的防御效果，尤其在博客数据集上使盗版模型的准确率降低了17.4个百分点，相比之下对后验概率进行噪声扰动的方法仅能降低约2个百分点。实验结果表明，所提方法解决了现有扰动、软化标签等主动防御方法存在的问题，在不影响测试样本分类结果的前提下，通过扰动云端模型输出的类别概率分布特征，成功降低了盗版模型的准确率，实现了对云端模型机密性的可靠保障。

关键词: 深度学习, 黑盒场景, 云端模型, 模型功能窃取, 模型窃取防御, 暗知识保护

CLC Number:

TP389.1

Zhi ZHANG, Xin LI, Naifu YE, Kaixi HU. DKP： defending against model stealing attacks based on dark knowledge protection[J]. Journal of Computer Applications, 2024, 44(7): 2080-2086.

张郅, 李欣, 叶乃夫, 胡凯茜. 基于暗知识保护的模型窃取防御技术DKP[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2080-2086.

Figures/Tables 11

References 35

1	JORDAN M I， MITCHELL T M. Machine learning： trends， perspectives， and prospects ［J］. Science， 2015， 349（6245）： 255-260.
2	GAO J， GALLEY M， LI L. Neural approaches to conversational AI： question answering， task-oriented dialogues and social chatbots ［EB/OL］. （2018-09-21）［2023-06-20］. .
3	LI J. Recent advances in end-to-end automatic speech recognition ［EB/OL］. （2022-11-01）［2023-06-20］. .
4	HAMMOUCHE R， ATTIA A， AKHROUF S， et al. Gabor filter bank with deep autoencoder based face recognition system ［J］. Expert Systems with Applications， 2022， 197： 116743.
5	HU Z， ZHANG Y， XING Y， et al. Toward human-centered automated driving： a novel spatiotemporal vision transformer-enabled head tracker ［J］. IEEE Vehicular Technology Magazine， 2022， 17（4）： 57-64.
6	PAPERNOT N， McDANIEL P， GOODFELLOW I， et al. Practical black-box attacks against machine learning ［C］// Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security. New York： ACM， 2017： 506-519.
7	HE X， LYU L， SUN L， et al. Model extraction and adversarial transferability， your BERT is vulnerable！［C］// Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies. Stroudsburg： ACL， 2021： 2006-2012.
8	李长升，汪诗烨，李延铭，等.人工智能的逆向工程——反向智能研究综述［J］.软件学报， 2023， 34（2）： 712-732.
	LI C S， WANG S Y， LI Y M， et al. Survey on reverse-engineering artificial intelligence ［J］. Journal of Software， 2023， 34（2）： 712-732.
9	JUUTI M， SZYLLER S， MARCHAL S， et al. PRADA： protecting against DNN model stealing attacks ［C］// Proceedings of the 2019 IEEE European Symposium on Security and Privacy. Piscataway： IEEE， 2019： 512-527.
10	ZANELLA-BÉGUELIN S， TOPLE S， PAVERD A， et al. Grey-box extraction of natural language models ［C］// Proceedings of the 38th International Conference on Machine Learning. New York： PMLR， 2021： 12278-12286.
11	KRISHNA K， TOMAR G S， PARIKH A P， et al. Thieves on sesame street！ model extraction of BERT-based APIs ［EB/OL］. ［2023-06-20］. .
12	TRAMÈR F， ZHANG F， JUELS A， et al. Stealing machine learning models via prediction APIs ［C］// Proceedings of the 25th USENIX Security Symposium. Berkley： USENIX Association， 2016： 601-618.
13	MURPHY G C， NOTKIN D. Lightweight source model extraction ［J］. ACM SIGSOFT Software Engineering Notes， 1995， 20（4）： 116-127.
14	YOSHIDA K， KUBOTA T， SHIOZAKI M， et al. Model-extraction attack against FPGA-DNN accelerator utilizing correlation electromagnetic analysis ［C］// Proceedings of the 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines. Piscataway： IEEE， 2019： 318-318.
15	MILLI S， SCHMIDT L， DRAGAN A D， et al. Model reconstruction from model explanations ［C］// Proceedings of the 2019 Conference on Fairness， Accountability， and Transparency. New York： ACM， 2019： 1-9.
16	OREKONDY T， SCHIELE B， FRITZ M. Prediction poisoning： Towards defenses against DNN model stealing attacks ［C/OL］// Proceedings of the 2019 International Conference on Learning Representations （ 2020-03-03）［2023-07-01］. .
17	WANG B， GONG N Z. Stealing hyperparameters in machine learning ［C］// Proceedings of the 2018 IEEE Symposium on Security and Privacy. Piscataway： IEEE， 2018： 36-52.
18	OH S J， SCHIELE B， FRITZ M. Towards reverse-engineering black-box neural networks ［C］// Explainable AI： Interpreting， Explaining and Visualizing Deep Learning. Cham： Springer， 2019： 121-144.
19	LOWD D， MEEK C. Adversarial learning ［C］// Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining. New York： ACM， 2005： 641-647.
20	CORREIA-SILVA J R， BERRIEL R F， BADUE C， et al. Copycat_CNN： stealing knowledge by persuading confession with random non-labeled data ［C］// Proceedings of the 2018 International Joint Conference on Neural Networks. Piscataway： IEEE， 2018： 1-8.
21	OREKONDY T， SCHIELE B， FRITZ M. Knockoff nets： stealing functionality of black-box models ［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 4949-4958.
22	SHI Y， SAGDUYU Y， GRUSHIN A. How to steal a machine learning classifier with deep learning ［C］// Proceedings of the 2017 IEEE International Symposium on Technologies for Homeland Security. Piscataway： IEEE， 2017： 1-5.
23	PAL S， GUPTA Y， SHUKLA A， et al. ActiveThief： model extraction using active learning and unannotated public data ［J］. Proceedings of the AAAI Conference on Artificial Intelligence， 2020， 34（1）： 865-872.
24	KESARWANI M， MUKHOTY B， ARYA V， et al. Model extraction warning in MLaaS paradigm ［C］// Proceedings of the 34th Annual Computer Security Applications Conference. New York： ACM， 2018： 371-380.
25	TRUEX S， LIU L， GURSOY M E， et al. Effects of differential privacy and data skewness on membership inference vulnerability ［C］// Proceedings of the 2019 First IEEE International Conference on Trust， Privacy and Security in Intelligent Systems and Applications. Piscataway： IEEE， 2019： 82-91.
26	TRUEX S， LIU L， GURSOY M E， et al. Demystifying membership inference attacks in machine learning as a service ［J］. IEEE Transactions on Services Computing， 2021， 14（6）： 2073-2089.
27	LEE T， EDWARDS B， MOLLOY I， et al. Defending against machine learning model stealing attacks using deceptive perturbations ［EB/OL］. （2018-05-31）［2023-07-01］. .
28	CHOQUETTE-CHOO C A， TRAMER F， CARLINI N， et al. Label-only membership inference attacks ［C］// Proceedings of the 38th International Conference on Machine Learning. New York： PMLR， 2021： 1964-1974.
29	HINTON G， VINYALS O， DEAN J. Distilling the knowledge in a neural network ［EB/OL］. （2015-03-09）［2023-06-20］. .
30	LI Q， PENG H， LI J， et al. A survey on text classification： from traditional to deep learning ［J］. ACM Transactions on Intelligent Systems and Technology， 2022， 13（2）： No. 31.
31	HOVY D， JOHANNSEN A， SØGAARD A. User review sites as a resource for large-scale sociolinguistic studies ［C］// Proceedings of the 24th international conference on World Wide Web. Republic and Canton of Geneva， Switzerland： International World Wide Web Conferences Steering Committee， 2015： 452-461.
32	HUANG W， WANG J. Character-level convolutional network for text classification applied to Chinese corpus ［EB/OL］. （2016-11-14）［2023-06-20］. .
33	DEL CORSO G M， GULLI A， ROMANI F. Ranking a stream of news ［C］// Proceedings of the 14th International Conference on World Wide Web. New York： ACM， 2005： 97-106.
34	SCHLER J， KOPPEL M， ARGAMON S， et al. Effects of age and gender on blogging ［C/OL］// Proceedings of the 2006 AAAI Spring Symposium： Computational Approaches to Analyzing Weblogs ［2023-06-20］. .
35	HERMANN K M， KOCISKY T， GREFENSTETTE E， et al. Teaching machines to read and comprehend ［C］// Proceedings of the 28th International Conference on Neural Information Processing Systems. Cambridge： MIT Press， 2015： 1693-1701.

数据集	样本数			任务
数据集	训练集	验证集	测试集	任务
TP-US	22 142	2 767	2 767	语义分析
Felp	520 000	40 000	1 000	语义分析
AG	112 000	1 457	1 457	主题分类
Blog	7 098	887	887	主题分类

数据集	样本数			任务
数据集	训练集	验证集	测试集	任务
TP-US	22 142	2 767	2 767	语义分析
Felp	520 000	40 000	1 000	语义分析
AG	112 000	1 457	1 457	主题分类
Blog	7 098	887	887	主题分类

防御方法	描述	参数说明
拉普拉斯噪声	在预测概率分布中加入方差σ的拉普拉斯噪声	σ从范围（0，1）中随机选取
高斯噪声	在预测概率分布中加入方差σ的高斯噪声	σ从范围（0，1）中随机选取
变温	改变softmax层上的温度系数T操纵后验概率分布	温度系数取固定值
分区变温（本文方法）	通过设置合理的区间，针对不同的后验概率使用不同的温度系数操纵后验概率分布	分区设置不同的温度系数

防御方法	描述	参数说明
拉普拉斯噪声	在预测概率分布中加入方差σ的拉普拉斯噪声	σ从范围（0，1）中随机选取
高斯噪声	在预测概率分布中加入方差σ的高斯噪声	σ从范围（0，1）中随机选取
变温	改变softmax层上的温度系数T操纵后验概率分布	温度系数取固定值
分区变温（本文方法）	通过设置合理的区间，针对不同的后验概率使用不同的温度系数操纵后验概率分布	分区设置不同的温度系数

模型	TP-US	Yelp	AG	Blog
受害模型	85.5	95.6	94.5	97.1
盗版模型	85.3	94.1	90.5	88.2

DKP： defending against model stealing attacks based on dark knowledge protection

基于暗知识保护的模型窃取防御技术DKP

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 11

References 35

Related Articles 15

Recommended Articles

Metrics

防御方法	TP-US		Yelp		AG		Blog
防御方法	准确率/%	变化百分点	准确率/%	变化百分点	准确率/%	变化百分点	准确率/%	变化百分点
不采取防御	85.3	—	94.1	—	90.5	—	88.2	—
拉普拉斯噪声	84.4	-0.9	92.4	-1.7	90.2	-0.3	86.3	-1.9
高斯噪声^［7］	85.6	+0.3	92.7	-1.4	90.2	-0.3	86.2	-2.0
变温（T=0.0）^［7］	84.6	-0.7	93.7	-0.4	90.0	-0.5	85.6	-2.6
变温（T=0.5）^［7］	85.1	-0.2	93.8	-0.3	90.3	-0.2	85.7	-2.5
变温（T=5.0）^［7］	85.3	-0.0	94.5	+0.4	90.9	+0.4	86.7	-1.5
分区变温（本文方法）	77.6	-7.7	91.3	-2.8	85.1	-5.4	70.8	-17.4

类别	初始置信度分布	防御后置信度分布
主题1	0.00	0.10
主题2	0.00	0.10
主题3	1.00	0.11
主题4	0.00	0.10
主题5	0.00	0.10
主题6	0.00	0.10
主题7	0.00	0.10
主题8	0.00	0.10
主题9	0.00	0.10
主题10	0.00	0.10

受害模型	盗版模型	盗版模型准确率	防御后准确率
BERT-large	BERT-large	88.3	34.0
BERT-base	BERT-large	87.5	34.0
BERT-base	BERT-base	88.2	76.1
BERT-large	BERT-base	88.7	72.2

[1]	Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974.
[2]	Xiyuan WANG, Zhancheng ZHANG, Shaokang XU, Baocheng ZHANG, Xiaoqing LUO, Fuyuan HU. Unsupervised cross-domain transfer network for 3D/2D registration in surgical navigation [J]. Journal of Computer Applications, 2024, 44(9): 2911-2918.
[3]	Yexin PAN, Zhe YANG. Optimization model for small object detection based on multi-level feature bidirectional fusion [J]. Journal of Computer Applications, 2024, 44(9): 2871-2877.
[4]	Yunchuan HUANG, Yongquan JIANG, Juntao HUANG, Yan YANG. Molecular toxicity prediction based on meta graph isomorphism network [J]. Journal of Computer Applications, 2024, 44(9): 2964-2969.
[5]	Shunyong LI, Shiyi LI, Rui XU, Xingwang ZHAO. Incomplete multi-view clustering algorithm based on self-attention fusion [J]. Journal of Computer Applications, 2024, 44(9): 2696-2703.
[6]	Yuhan LIU, Genlin JI, Hongping ZHANG. Video pedestrian anomaly detection method based on skeleton graph and mixed attention [J]. Journal of Computer Applications, 2024, 44(8): 2551-2557.
[7]	Yanjie GU, Yingjun ZHANG, Xiaoqian LIU, Wei ZHOU, Wei SUN. Traffic flow forecasting via spatial-temporal multi-graph fusion [J]. Journal of Computer Applications, 2024, 44(8): 2618-2625.
[8]	Qianhong SHI, Yan YANG, Yongquan JIANG, Xiaocao OUYANG, Wubo FAN, Qiang CHEN, Tao JIANG, Yuan LI. Multi-granularity abrupt change fitting network for air quality prediction [J]. Journal of Computer Applications, 2024, 44(8): 2643-2650.
[9]	Zheng WU, Zhiyou CHENG, Zhentian WANG, Chuanjian WANG, Sheng WANG, Hui XU. Deep learning-based classification of head movement amplitude during patient anaesthesia resuscitation [J]. Journal of Computer Applications, 2024, 44(7): 2258-2263.
[10]	Huanhuan LI, Tianqiang HUANG, Xuemei DING, Haifeng LUO, Liqing HUANG. Public traffic demand prediction based on multi-scale spatial-temporal graph convolutional network [J]. Journal of Computer Applications, 2024, 44(7): 2065-2072.
[11]	Yiqun ZHAO, Zhiyu ZHANG, Xue DONG. Anisotropic travel time computation method based on dense residual connection physical information neural networks [J]. Journal of Computer Applications, 2024, 44(7): 2310-2318.
[12]	Song XU, Wenbo ZHANG, Yifan WANG. Lightweight video salient object detection network based on spatiotemporal information [J]. Journal of Computer Applications, 2024, 44(7): 2192-2199.
[13]	Xun SUN, Ruifeng FENG, Yanru CHEN. Monocular 3D object detection method integrating depth and instance segmentation [J]. Journal of Computer Applications, 2024, 44(7): 2208-2215.
[14]	Yajuan ZHAO, Fanjun MENG, Xingjian XU. Review of online education learner knowledge tracing [J]. Journal of Computer Applications, 2024, 44(6): 1683-1698.
[15]	Yuanjiong LIU, Maozheng HE, Yibin HUANG, Cheng QIAN. Ship identification model based on ResNet50 and improved attention mechanism [J]. Journal of Computer Applications, 2024, 44(6): 1935-1941.