Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (7): 2065-2072.DOI: 10.11772/j.issn.1001-9081.2022071114

Special Issue: 第39届CCF中国数据库学术会议(NDBC 2022)

• The 39th CCF National Database Conference (NDBC 2022) • Previous Articles     Next Articles

Differential privacy generative adversarial network algorithm with dynamic gradient threshold clipping

Shaoquan CHEN, Jianping CAI(), Lan SUN   

  1. College of Computer and Data Science,Fuzhou University,Fuzhou Fujian 350108,China
  • Received:2022-07-12 Revised:2022-08-10 Accepted:2022-08-15 Online:2023-07-20 Published:2023-07-10
  • Contact: Jianping CAI
  • About author:CHEN Shaoquan, born in 1996, M. S. candidate. His research interests include machine learning, differential privacy.
    CAI Jianping, born in 1990, Ph. D. candidate. His research interests include differential privacy, federal learning, machine learning.
    SUN Lan, born in 1978, M. S., lecturer. Her research interests include data security, privacy protection.

动态梯度阈值裁剪的差分隐私生成对抗网络算法

陈少权, 蔡剑平(), 孙岚   

  1. 福州大学 计算机与大数据学院,福州 350108
  • 通讯作者: 蔡剑平
  • 作者简介:陈少权(1996—),男,福建泉州人,硕士研究生,CCF学生会员,主要研究方向:机器学习、差分隐私;
    蔡剑平(1990—),男,福建漳州,博士研究生,主要研究方向:差分隐私、联邦学习、机器学习;
    孙岚(1978—),女,福建福州人,讲师,硕士,主要研究方向:数据安全、隐私保护。

Abstract:

Most of the existing methods combining Generative Adversarial Network (GAN) and differential privacy use gradient perturbation to achieve privacy protection, that is in the process of optimization, the gradient clipping technology was used to constrain the sensitivity of the optimizer to single data, and random noise is added to the clipped gradient to achieve the purpose of model protection. However, most methods take the clipping threshold as a fixed parameter during training. Whether the threshold is too large or too small, the performance of the model will be affected. To solve this problem, DGC_DPGAN (Dynamic Gradient Clipping Differential Privacy Generative Adversarial Network) with dynamic gradient threshold clipping was proposed to consider privacy protection and model performance at the same time. In this algorithm, combined with the pre-training technology, in the process of optimization, the mean gradient F-norm value of each batch of privacy data was obtained as the dynamic gradient clipping threshold at first, and then the gradient was perturbed. Considering different clipping orders, CLIP_DGC_DPGAN (Clip Dynamic Gradient Clipping Differential Privacy Generative Adversarial Network), which clipping first and adding noise after, and DGC_DPGAN, which adding noise first and clipping after, were proposed, and Rényi Accountant was used to calculate the privacy loss. Experimental results show that under the same privacy budget, the two proposed dynamic gradient clipping algorithms are better than the fixed gradient threshold clipping method. On Mnist dataset, the two proposed algorithm has the Inception Score (IS), Structural SIMilarity (SSIM), and Convolutional Neural Network (CNN) classification accuracy improved by 0.32 to 3.92, 0.03 to 0.27, and 7% to 44% respectively; on Fashion-Mnist dataset, the two proposed algorithm has the IS, SSIM, and CNN classification accuracy improved by 0.40 to 4.32, 0.01 to 0.44 and 20% to 51% respectively. At the same time, the usability of the images generated by GAN model is better.

Key words: Generative Adversarial Network (GAN), differential privacy, dynamic gradient threshold clipping, Rényi Accountant

摘要:

现有的生成对抗网络(GAN)和差分隐私相结合的方法大多采用梯度扰动的方法实现隐私保护,即在优化过程中利用梯度裁剪技术来约束优化器对单个数据的敏感性,并对裁剪后的梯度添加随机噪声以达到保护模型的目的。然而大多数方法在训练时裁剪阈值固定,而阈值过大或过小均会影响模型的性能。针对该问题,提出动态梯度阈值裁剪的DGC_DPGAN (Dynamic Gradient Clipping Differential Privacy Generative Adversarial Network)算法以兼顾隐私保护和模型的性能。该算法结合预训练技术,在优化过程中先求取每批次隐私数据的梯度F-范数均值作为动态梯度裁剪阈值,再对梯度进行扰动。考虑不同的裁剪顺序,提出先裁剪再加噪的CLIP_DGC_DPGAN (Clip Dynamic Gradient Clipping Differential Privacy Generative Adversarial Network)算法和先加噪再裁剪的DGC_DPGAN算法,并采用Rényi Accountant求取隐私损失。实验结果表明,在相同的隐私预算下,所提出的两种动态梯度裁剪算法与固定梯度阈值裁剪方法相比更优:在Mnist数据集上,所提两种算法在IS(Inception Score)、结构相似性(SSIM)、卷积神经网络(CNN)分类准确率上分别提升了0.32~3.92,0.03~0.27,7%~44%;在Fashion-Mnist数据集上,所提两种算法在IS、SSIM、CNN分类准确率上分别提升了0.40~4.32,0.01~0.44,20%~51%。同时,GAN模型生成图像的可用性更好。

关键词: 生成对抗网络, 差分隐私, 动态梯度阈值裁剪, Rényi Accountant

CLC Number: