Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (7): 2080-2086.DOI: 10.11772/j.issn.1001-9081.2023071056

• Cyber security • Previous Articles     Next Articles

DKP: defending against model stealing attacks based on dark knowledge protection

Zhi ZHANG, Xin LI(), Naifu YE, Kaixi HU   

  1. Academy of Information Network Security,People's Public Security University of China,Beijing 100038,China
  • Received:2023-08-04 Revised:2023-10-01 Accepted:2023-10-10 Online:2023-10-26 Published:2024-07-10
  • Contact: Xin LI
  • About author:ZHANG Zhi, born in 1999, M. S. candidate. His research interests include cyberspace security.
    YE Naifu, born in 1999, M. S. candidate. His research interests include cyberspace security, natural language processing.
    HU Kaixi, born in 2000, M. S. candidate. Her research interests include cyberspace security.
    First author contact:LI Xin, born in 1977, Ph. D., professor. His research interests include cloud computing, network security.
  • Supported by:
    National Key Research and Development Program of China(2020 AAA0107705)

基于暗知识保护的模型窃取防御技术DKP

张郅, 李欣(), 叶乃夫, 胡凯茜   

  1. 中国人民公安大学 信息网络安全学院,北京 100038
  • 通讯作者: 李欣
  • 作者简介:张郅(1999—),男,山西吕梁人,硕士研究生,主要研究方向:网络空间安全;
    叶乃夫(1999—),男,山东济南人,硕士研究生,主要研究方向:网络空间安全、自然语言处理;
    胡凯茜(2000—),女,河南平顶山人,硕士研究生,主要研究方向:网络空间安全。
    第一联系人:李欣(1977—),男,江西赣州人,教授,博士,CCF会员,主要研究方向:云计算、网络安全;
  • 基金资助:
    国家重点研发计划项目(2020 AAA0107705)

Abstract:

In black-box scenarios, using model function stealing methods to generate piracy models has posed a serious threat to the security and intellectual property protection of models in the cloud. To solve the problem of existing model stealing defense techniques, such as perturbation and softening labels (variable temperature), may cause the category with the maximum confidence value in the model output to change, thereby affecting the performance of the model in the original task, a model stealing defense method based on dark knowledge protection was proposed which was called DKP (defending against model stealing attacks based on Dark Knowledge Protection). First, the cloud model to be protected was used to process the test samples, obtaining its initial confidence distribution vector. Then, a dark knowledge protection layer was added after model output layer, and the initial confidence distribution vector was perturbed through the partitioned temperature-regulated softmax mechanism. Finally, the defended confidence distribution vector was obtained, thus reducing the risk of model information leakage. The proposed method achieved significant defensive effects on four public datasets; especially on the blog dataset, the accuracy of the piracy model was reduced by 17.4 percentage points, while the method of noise perturbation of the posterior probability only reduced the accuracy of the piracy model by about 2 percentage points. The experimental results show that the proposed method solves the problems of existing active defense methods such as perturbation and softening labels, successfully reduces the accuracy of the piracy model by perturbing the category probability distribution features of the cloud model output without affecting the classification results, and achieves a reliable guarantee of the confidentiality of cloud model.

Key words: deep learning, black-box scenarios, cloud-based model, model function theft, model stealing defense, dark knowledge protection

摘要:

在黑盒场景下,使用模型功能窃取方法生成盗版模型已经对云端模型的安全性和知识产权保护构成严重威胁。针对扰动和软化标签(变温)等现有的模型窃取防御技术可能导致模型输出中置信度最大值的类别发生改变,进而影响原始任务中模型性能的问题,提出一种基于暗知识保护的模型功能窃取防御方法,称为DKP(defending against model stealing attacks based on Dark Knowledge Protection)。首先,利用待保护的云端模型对测试样本进行处理,以获得样本的初始置信度分布向量;然后,在模型输出层之后添加暗知识保护层,通过分区变温调节softmax机制对初始置信度分布向量进行扰动处理;最后,得到经过防御的置信度分布向量,从而降低模型信息泄露的风险。使用所提方法在4个公开数据集上取得了显著的防御效果,尤其在博客数据集上使盗版模型的准确率降低了17.4个百分点,相比之下对后验概率进行噪声扰动的方法仅能降低约2个百分点。实验结果表明,所提方法解决了现有扰动、软化标签等主动防御方法存在的问题,在不影响测试样本分类结果的前提下,通过扰动云端模型输出的类别概率分布特征,成功降低了盗版模型的准确率,实现了对云端模型机密性的可靠保障。

关键词: 深度学习, 黑盒场景, 云端模型, 模型功能窃取, 模型窃取防御, 暗知识保护

CLC Number: