Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (12): 3759-3765.DOI: 10.11772/j.issn.1001-9081.2023121740

• Artificial intelligence • Previous Articles     Next Articles

Federated learning client selection method based on label classification

Zucuan ZHANG1,2,3, Xuebin CHEN1,2,3(), Rui GAO1,2,3, Yuanhuai ZOU1,2,3   

  1. 1.College of Sciences,North China University of Science and Technology,Tangshan Hebei 063210,China
    2.Hebei Province Key Laboratory of Data Science and Application (North China University of Science and Technology),Tangshan Hebei 063210,China
    3.Tangshan Key Laboratory of Data Science (North China University of Science and Technology),Tangshan Hebei 063210,China
  • Received:2023-12-18 Revised:2024-04-13 Accepted:2024-04-17 Online:2024-05-07 Published:2024-12-10
  • Contact: Xuebin CHEN
  • About author:ZHANG Zucuan, born in 1998, M. S. candidate. His research interests include data security, federated learning.
    GAO Rui, born in 2000, M. S. candidate. His research interests include data security, privacy protection.
    ZOU Yuanhuai, born in 1998, M. S. candidate. His research interests include data security, traffic detection.
  • Supported by:
    National Natural Science Foundation of China(U20A20179)

基于标签分类的联邦学习客户端选择方法

张祖篡1,2,3, 陈学斌1,2,3(), 高瑞1,2,3, 邹元怀1,2,3   

  1. 1.华北理工大学 理学院,河北 唐山 063210
    2.河北省数据科学与应用重点实验室(华北理工大学),河北 唐山 063210
    3.唐山市数据科学重点实验室(华北理工大学),河北 唐山 063210
  • 通讯作者: 陈学斌
  • 作者简介:张祖篡(1998—),男,江苏徐州人,硕士研究生,CCF会员,主要研究方向:数据安全、联邦学习
    高瑞(2000—),男,江苏南京人,硕士研究生,CCF会员,主要研究方向:数据安全、隐私保护
    邹元怀(1998—),男,湖南娄底人,硕士研究生,CCF会员,主要研究方向:数据安全、流量检测。
  • 基金资助:
    国家自然科学基金资助项目(U20A20179)

Abstract:

As a distributed machine learning method, federated learning can fully exploit the value in the data while protecting data privacy. However, as the traditional federated learning training method only selects participating clients randomly, it is difficult to adapt to Not Identically and Independently Distributed (Non-IID) datasets. To solve the problems of low accuracy and slow convergence of federated learning models under Non-IID data, a Federated learning Client Selection method based on Label Classification (FedLCCS) was proposed. Firstly, the client dataset labels were classified and sorted according to the frequency statistics results. Then, clients with high-frequency labels were selected to participate in training. Finally, models with different accuracy were obtained by adjusting own parameters. Experimental results on MNIST, Fashion-MNIST and Cifar-10 datasets show that the two baseline methods, Federated Averaging (FedAvg) and Federated Proximal (FedProx), after combining with FedLCCS are better than the original ones under the initial dataset label selection ratio. The minimum accuracy improvements are 9.13 and 6.53 percentage points, the minimum convergence speed improvements are 57.41% and 18.52%, and the minimum running time reductions are 7.60% and 17.62%. The above verifies that FedLCCS can optimize the accuracy, convergence speed and running efficiency of federated models, and can train models with different accuracy to meet diversified demands.

Key words: federated learning, Not Identically and Independently Distributed (Non-IID), client selection, frequency statistics, classification and sorting

摘要:

联邦学习作为一种分布式的机器学习方法,在保护数据隐私的同时可以充分挖掘数据中的价值;然而传统的联邦学习训练方法只是随机选择参与客户端,难以适应非独立同分布(Non-IID)数据集。针对Non-IID数据下联邦学习模型精度低、收敛慢等问题,提出一种基于标签分类的联邦学习客户端选择方法(FedLCCS)。首先,按照频数统计结果分类排序客户端数据集标签;其次,选择拥有高频数标签的客户端参与训练;最后,通过调节自有参数获取不同精度的模型。在MNIST、Fashion-MNIST和Cifar-10数据集上的实验结果表明,结合FedLCCS后的联邦平均(FedAvg)和联邦近端优化(FedProx)这2种基线方法相较于原始方法在初始数据集标签选择比例下的准确率至少提高了9.13和6.53个百分点,收敛速度至少提升了57.41%和18.52%,运行时间至少降低了7.60%和17.62%。以上验证了FedLCCS可以优化联邦模型的精度、收敛速度和运行效率,且能够训练不同准确率的模型以应对多样化的需求。

关键词: 联邦学习, 非独立同分布, 客户端选择, 频数统计, 分类排序

CLC Number: