Journal of Computer Applications ›› 2026, Vol. 46 ›› Issue (1): 21-32.DOI: 10.11772/j.issn.1001-9081.2024121817

• Artificial intelligence • Previous Articles     Next Articles

Data augmentation scheme based on conditional generative adversarial network in federated learning

Yinlong JIAN1,2,3, Xuebin CHEN1,2,3(), Zhongrui JING1,2,3, Qi ZHONG1,2,3, Zhenbo ZHANG1,2,3   

  1. 1.College of Science,North China University of Science and Technology,Tangshan Hebei 063210,China
    2.Hebei Provincial Key Laboratory of Data Science and Application (North China University of Science and Technology),Tangshan Hebei 063210,China
    3.Tangshan Key Laboratory of Data Science (North China University of Science and Technology),Tangshan Hebei 063210,China
  • Received:2024-12-27 Revised:2025-03-04 Accepted:2025-03-10 Online:2026-01-10 Published:2026-01-10
  • Contact: Xuebin CHEN
  • About author:JIAN Yinlong, born in 2001, M. S. candidate. His research interests include data security, privacy protection.
    JING Zhongrui, born in 2000, M. S. candidate. His research interests include data security, privacy protection.
    ZHONG Qi, born in 1999, M. S. candidate. Her research interests include data security, privacy protection.
    ZHANG Zhenbo, born in 1999, M. S. candidate. His research interests include data security, privacy protection.
  • Supported by:
    National Natural Science Foundation of China(U20A20179)

联邦学习中基于条件生成对抗网络的数据增强方案

菅银龙1,2,3, 陈学斌1,2,3(), 景忠瑞1,2,3, 钟琪1,2,3, 张镇博1,2,3   

  1. 1.华北理工大学 理学院,河北 唐山 063210
    2.河北省数据科学与应用重点实验室(华北理工大学),河北 唐山 063210
    3.唐山市数据科学重点实验室(华北理工大学),河北 唐山 063210
  • 通讯作者: 陈学斌
  • 作者简介:菅银龙(2001—),男,河南商丘人,硕士研究生,CCF会员,主要研究方向:数据安全、隐私保护
    景忠瑞(2000—),男,山西临汾人,硕士研究生,CCF会员,主要研究方向:数据安全、隐私保护
    钟琪(1999—),女,河北张家口人,硕士研究生,CCF会员,主要研究方向:数据安全、隐私保护
    张镇博(1999—),男,山东济南人,硕士研究生,CCF会员,主要研究方向:数据安全、隐私保护。
  • 基金资助:
    国家自然科学基金资助项目(U20A20179)

Abstract:

To address the challenges of slow convergence and low model accuracy in non-independent and identically distributed (Non-IID) scenarios, this paper proposed a Data Augmentation scheme based on conditional Generative Adversarial Network in Federated learning (FDA-GAN). First, a conditional generator for class selection was designed, adding an independent network module to each class and using the label as conditional information to more accurately extract specific features for each class. Second, a client selection strategy covering all classes was proposed. Based on the comprehensive reward of the clients, a client set containing as many classes as possible was selected for training, ensuring that the Generative Adversarial Network (GAN) could learn the complete class distribution. Finally, generated samples were used to augment the local datasets of the clients, optimizing the feature composition of the local data and reducing bias between clients. Experimental results show that under DIRichlet distributed (DIR) data partitioning, compared to CAP-GAN (Collaborated gAme Parallel learning based on GAN), FDA-GAN improves the MNIST Score (MNIST inception Score) and Mode Score by 2.67 and 1.08 respectively, and reduces the FID (Fréchet Inception Distance) and MMD (Maximum Mean Discrepancy) scores by 55.12 and 2.56 respectively; in different Non-IID scenarios, the FedAvg (Federated Averaging) and FedProx (Federated Proximal) algorithms, when combined with FDA-GAN, converge within 50 communication rounds, with accuracy improvements of at least 30.36 percentage points. This demonstrates that FDA-GAN can improve the quality and diversity of generated samples, and when combined with baseline algorithms, it can significantly improve the accuracy and convergence speed of the federated model.

Key words: Generative Adversarial Network (GAN), federated learning, Non-Independent and Identically Distributed (Non-IID), client selection, data augmentation

摘要:

针对非独立同分布(Non-IID)场景下,联邦学习系统面临收敛缓慢和模型准确率降低等挑战,提出联邦学习中基于条件生成对抗网络的数据增强方案(FDA-GAN)。首先,设计一种类别选择的条件生成器为每个类别添加独立的网络模块,并将标签作为条件信息,以更精确地提取各类别的特定特征;其次,提出一种覆盖类别的客户端选择策略来基于客户端的综合奖励,选择包含尽可能多类别的客户端集合参与训练,确保生成对抗网络(GAN)能学习到完整的类别分布;最后,利用生成样本扩充客户端的本地数据集,以优化本地数据的特征构成,减小客户端之间的偏差。实验结果表明,FDA-GAN在狄利克雷数据划分下,相较于CAP-GAN (Collaborated gAme Parallel learning based on GAN)的MNIST Score (MNIST inception Score)和Mode Score指标上分别提升了2.67和1.08, 在FID (Fréchet Inception Distance)和MMD (Maximum Mean Discrepancy)指标上分别降低了55.12和2.56;在不同的Non-IID场景下, FedAvg (Federated Averaging)和FedProx (Federated Proximal)算法在结合FDA-GAN后,在50轮通信轮次内达到收敛,并且准确率提升了至少30.36个百分点。可见, FDA-GAN可以提高生成样本的质量与多样性,而且与基线算法结合后可以大幅提高联邦模型的准确率和收敛速度。

关键词: 生成对抗网络, 联邦学习, 非独立同分布, 客户端选择, 数据增强

CLC Number: