Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (4): 1086-1094.DOI: 10.11772/j.issn.1001-9081.2024010132

• Artificial intelligence • Previous Articles     Next Articles

Clustering federated learning algorithm for heterogeneous data

Qingli CHEN(), Yuanbo GUO, Chen FANG   

  1. Department of Cryptography Engineering,Information Engineering University,Zhengzhou Henan 450001,China
  • Received:2024-02-05 Revised:2024-04-04 Accepted:2024-04-07 Online:2024-05-09 Published:2025-04-10
  • Contact: Qingli CHEN
  • About author:CHEN Qingli, born in 1998, M. S. candidate. His research interests include federated learning.
    GUO Yuanbo, born in 1975, Ph. D., professor. His research interests include big data security, situation awareness.
    FANG Chen, born in 1993, Ph. D., lecturer. His research interests include privacy protection, federated learning.

面向数据异构的聚类联邦学习算法

陈庆礼(), 郭渊博, 方晨   

  1. 信息工程大学 密码工程学院,郑州 450001
  • 通讯作者: 陈庆礼
  • 作者简介:陈庆礼(1998—),男,河南新乡人,硕士研究生,主要研究方向:联邦学习
    郭渊博(1975—),男,陕西周至人,教授,博士,主要研究方向:大数据安全、态势感知
    方晨(1993—),男,安徽铜陵人,讲师,博士,主要研究方向:隐私保护、联邦学习。

Abstract:

Federated Learning (FL) is a new machine learning model construction paradigm with great potential in privacy preservation and communication efficiency, but in real Internet of Things (IoT) scenarios, there is data heterogeneity between client nodes, and learning a unified global model will lead to a decrease in model accuracy. To solve this problem, a Clustering Federated Learning based on Feature Distribution (CFLFD) algorithm was proposed. In this algorithm, the results obtained through Principal Component Analysis (PCA) of the features extracted from the model by each client node were clustered in order to cluster client nodes with similar data distribution to collaborate with each other, so as to achieve higher model accuracy. In order to demonstrate the effectiveness of the algorithm, extensive experiments were conducted on three datasets and four benchmark algorithms. The results show that the algorithm improves model accuracy by 1.12 and 3.76 percentage points respectively compared to the FedProx on CIFAR10 dataset and Office-Caltech10 dataset.

Key words: Federated Learning (FL), clustering, feature extraction, Principal Component Analysis (PCA), personalized federated learning

摘要:

联邦学习(FL)是一种在隐私保护和通信效率方面极具潜力的新型机器学习模型构建范式,然而现实物联网(IoT)场景中客户端节点数据之间会存在异构性,学习一个统一的全局模型会导致模型准确率下降。为了解决这一问题,提出一种基于特征分布的聚类联邦学习(CFLFD)算法。在该算法中,对每个客户端节点从模型提取的特征进行主成分分析(PCA)后所得到的结果进行聚类,以将具有相似数据分布的客户端节点聚类在一起相互协作,从而提高模型准确率。为验证算法的有效性,在3个数据集和4种基准算法上进行大量实验。实验结果表明,与FedProx相比,CFLFD算法在CIFAR10数据集和Office-Caltech10数据集上将模型准确率分别提升了1.12和3.76个百分点。

关键词: 联邦学习, 聚类, 特征提取, 主成分分析, 个性化联邦学习

CLC Number: