Clustering federated learning algorithm for heterogeneous data

doi:10.11772/j.issn.1001-9081.2024010132

Abstract

Abstract:

Federated Learning （FL） is a new machine learning model construction paradigm with great potential in privacy preservation and communication efficiency， but in real Internet of Things （IoT） scenarios， there is data heterogeneity between client nodes， and learning a unified global model will lead to a decrease in model accuracy. To solve this problem， a Clustering Federated Learning based on Feature Distribution （CFLFD） algorithm was proposed. In this algorithm， the results obtained through Principal Component Analysis （PCA） of the features extracted from the model by each client node were clustered in order to cluster client nodes with similar data distribution to collaborate with each other， so as to achieve higher model accuracy. In order to demonstrate the effectiveness of the algorithm， extensive experiments were conducted on three datasets and four benchmark algorithms. The results show that the algorithm improves model accuracy by 1.12 and 3.76 percentage points respectively compared to the FedProx on CIFAR10 dataset and Office-Caltech10 dataset.

Key words: Federated Learning (FL), clustering, feature extraction, Principal Component Analysis (PCA), personalized federated learning

摘要：

联邦学习（FL）是一种在隐私保护和通信效率方面极具潜力的新型机器学习模型构建范式，然而现实物联网（IoT）场景中客户端节点数据之间会存在异构性，学习一个统一的全局模型会导致模型准确率下降。为了解决这一问题，提出一种基于特征分布的聚类联邦学习（CFLFD）算法。在该算法中，对每个客户端节点从模型提取的特征进行主成分分析（PCA）后所得到的结果进行聚类，以将具有相似数据分布的客户端节点聚类在一起相互协作，从而提高模型准确率。为验证算法的有效性，在3个数据集和4种基准算法上进行大量实验。实验结果表明，与FedProx相比，CFLFD算法在CIFAR10数据集和Office-Caltech10数据集上将模型准确率分别提升了1.12和3.76个百分点。

关键词: 联邦学习, 聚类, 特征提取, 主成分分析, 个性化联邦学习

CLC Number:

TP393

Qingli CHEN, Yuanbo GUO, Chen FANG. Clustering federated learning algorithm for heterogeneous data[J]. Journal of Computer Applications, 2025, 45(4): 1086-1094.

陈庆礼, 郭渊博, 方晨. 面向数据异构的聚类联邦学习算法[J]. 《计算机应用》唯一官方网站, 2025, 45(4): 1086-1094.

Figures/Tables 11

Fig. 1 System architecture of CFLFD algorithm

Tab. 1 Calinski-Harabasz index values under different K values

K	FMNIST	CIFAR10	Rotated CIFAR10	MNIST（ $α = 0.1$ ）
2	7.84	9.18	14.86	5.53
3	6.45	6.85	9.68	5.69
4	4.94	6.41	8.85	5.25
5	4.46	6.23	9.43	4.63
6	4.09	5.90	9.59	4.31
7	3.86	5.62	15.16	4.42
8	3.63	5.83	13.22	4.28
9	3.43	5.94	14.13	4.16
10	3.47	6.79	11.92	4.62

Tab. 1 Calinski-Harabasz index values under different K values

K	FMNIST	CIFAR10	Rotated CIFAR10	MNIST（ $α = 0.1$ ）
2	7.84	9.18	14.86	5.53
3	6.45	6.85	9.68	5.69
4	4.94	6.41	8.85	5.25
5	4.46	6.23	9.43	4.63
6	4.09	5.90	9.59	4.31
7	3.86	5.62	15.16	4.42
8	3.63	5.83	13.22	4.28
9	3.43	5.94	14.13	4.16
10	3.47	6.79	11.92	4.62

Fig. 2 Model accuracy under different data distributions

Tab. 2 Client data division of Office-Caltech10 dataset

数据类型	Client ID	数据类型	Client ID
Amazon	1，2，3，4，5	Webcam	11，12，13，14，15
DSLR	6，7，8，9，10	Caltech	16，17，18，19，20

Tab. 3 Accuracy comparison of different algorithms

算法	FMNIST	CIFAR10	Rotated CIFAR10	MNIST（ $α = 0.1$ ）
Local	82.13	46.03	26.30	98.23
FedAvg	82.01	48.36	26.83	66.65
FedProx	83.06	50.86	27.02	69.32
FedGen	81.25	47.21	27.33	65.47
CFLFD	83.61	51.98	27.89	70.12

Tab. 3 Accuracy comparison of different algorithms

算法	FMNIST	CIFAR10	Rotated CIFAR10	MNIST（ $α = 0.1$ ）
Local	82.13	46.03	26.30	98.23
FedAvg	82.01	48.36	26.83	66.65
FedProx	83.06	50.86	27.02	69.32
FedGen	81.25	47.21	27.33	65.47
CFLFD	83.61	51.98	27.89	70.12

Fig. 3 Model accuracies of different algorithms on IID datasets

Fig. 4 ROC curves and AUC values on different datasets

Tab. 4 Accuracy comparison of different client nodes on Office-Caltech10 dataset

算法	Amazon					DSLR
算法	client1	client2	client3	client4	client5	client6	client7	client8	client9	client10
Local Tai	61.66	63.75	64.16	62.50	62.91	72.50	67.50	62.50	65.00	67.50
FedAvg	65.83	65.83	63.33	64.16	64.16	62.50	65.00	62.50	62.50	67.50
FedProx	62.08	62.50	61.66	63.33	62.50	70.00	67.50	67.50	67.50	67.50
FedGen	64.16	62.91	62.50	63.75	63.75	67.50	67.50	62.50	67.50	65.00
CFLFD	66.66	68.33	68.33	67.50	66.25	72.50	75.00	72.50	72.50	75.00
算法	Webcam					Caltech
算法	client11	client12	client13	client14	client15	client16	client17	client18	client19	client20
Local	67.56	68.91	72.97	70.27	67.56	41.99	44.83	41.28	40.21	37.36
FedAvg	68.91	66.21	68.91	67.56	70.27	44.48	44.48	45.19	45.19	44.48
FedProx	78.37	78.37	79.72	82.43	77.02	42.70	44.12	41.28	42.70	41.99
FedGen	75.67	78.37	77.02	77.02	74.32	37.36	36.29	38.07	39.14	37.72
CFLFD	79.72	79.72	81.08	83.78	82.43	44.83	45.19	44.83	45.55	44.12

Tab. 5 Average accuracies on of different data domains

算法	Amazon	DSLR	Webcam	Caltech	平均
Local	62.99	67.00	69.45	41.13	60.14
FedAvg	64.66	64.00	68.37	44.76	60.44
FedProx	62.41	68.00	79.18	42.55	63.03
FedGen	63.41	66.00	76.48	37.71	60.90
CFLFD	67.41	73.50	81.34	44.90	66.79

Fig. 5 Improvements of FedAvg， FedProx， FedGen， and the proposed algorithms compared to Local Training algorithm in different domains

Fig. 6 Reconstructed images on CIFAR10 dataset

References 28

1	RUSSELL S J， NORVIG P. Artificial intelligence a modern approach ［M］. 4th ed. Hoboken， NJ： Pearson Education， Inc.， 2021.
2	ZHANG Q， CHENG L， BOUTABA R. Cloud computing： state-of-the-art and research challenges ［J］. Journal of Internet Services and Applications， 2010， 1： 7-18.
3	ZHU Q， WANG R， CHEN Q， et al. IOT gateway： BridgingWireless sensor networks into internet of things ［C］// Proceedings of the 2010 IEEE/IFIP International Conference on Embedded and Ubiquitous Computing. Piscataway： IEEE， 2010： 347-352.
4	YIN C， XIONG Z， CHEN H， et al. A literature survey on smart cities ［J］. SCIENCE CHINA Information Sciences， 2015， 58 （10）： No.100102.
5	LIU Y， YU J J Q， KANG J， et al. Privacy-preserving traffic flow prediction： a federated learning approach ［J］. IEEE Internet of Things Journal， 2020， 7 （8）： 7751-7763.
6	杨强. 联邦学习：人工智能的最后一公里［J］. 智能系统学报， 2020， 15 （1）： 183-186.
	YANG Q. Federated learning： the last on kilometer of artificial intelligence ［J］. CAAI Transactions on Intelligent Systems， 2020， 15 （1）： 183-186.
7	KANG J， XIONG Z， NIYATO D， et al. Reliable federated learning for mobile networks ［J］. IEEE Wireless Communications， 2020， 27 （2）： 72-80.
8	杨强. AI 与数据隐私保护：联邦学习的破解之道［J］. 信息安全研究， 2019， 5 （11）： 961-965.
	YANG Q. AI and data privacy protection： the way to federated learning ［J］. Journal of Information Security Research， 2019， 5 （11）： 961-965.
9	KHAN L U， YAQOOB I， TRAN N H， et al. Edge-computing-enabled smart cities： a comprehensive survey ［J］. IEEE Internet of Things Journal， 2020， 7 （10）： 10200-10232.
10	VOIGT P， VON DEM BUSSCHE A. The EU General Data Protection Regulation （GDPR）： a practical guide ［M］. 2nd ed. Cham： Springer， 2024.
11	DEAN J， CORRADO G S， MONGA R， et al. Large scale distributed deep networks ［C］// Proceedings of the 25th International Conference on Neural Information Processing Systems — Volume 1. Red Hook： Curran Associates Inc.， 2012： 1223-1231.
12	McMAHAN H B， MOORE E， RAMAGE D， et al. Communication-efficient learning of deep networks from decentralized data ［C］// Proceedings of the 20th Artificial Intelligence and Statistics. New York： JMLR.org， 2017： 1273-1282.
13	WANG K， MATHEWS R， KIDDON C， et al. Federated evaluation of on-device personalization ［EB/OL］. ［2023-10-10］. .
14	ACAR D A E， ZHAO Y， NAVARRO R M， et al. Federated learning based on dynamic regularization ［EB/OL］. ［2023-10-12］. .
15	YU T， BAGDASARYAN E， SHMATIKOV V. Salvaging federated learning by local adaptation ［EB/OL］. ［2023-08-10］. .
16	HINTON G， VINYALS O， DEAN J. Distilling the knowledge in a neural network ［EB/OL］. ［2023-08-15］. .
17	KARIMIREDDY S P， KALE S， MOHRI M， et al. SCAFFOLD： stochastic controlled averaging for federated learning ［C］// Proceedings of the 37th International Conference on Machine Learning. New York： JMLR.org， 2020： 5132-5143.
18	WANG J， LIU Q， LIANG H， et al. Tackling the objective inconsistency problem in heterogeneous federated optimization ［C］// Proceedings of the 34th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2020： 7611-7623.
19	LI T， HU S， BEIRAMI A， et al. Ditto： fair and robust federated learning through personalization ［C］// Proceedings of the 38th International Conference on Machine Learning. New York： JMLR.org， 2021： 6357-6368.
20	FALLAH A， MOKHTARI A， OZDAGLAR A. Personalized federated learning with theoretical guarantees： a model-agnostic meta-learning approach ［C］// Proceedings of the 34th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2020： 3557-3568.
21	LI X， JIANG M， ZHANG X， et al. FedBN： federated learning on non-IID features via local batch normalization ［EB/OL］. ［2023-09-02］. .
22	DENG Y， KAMANI M M， MAHDAVI M. Adaptive personalized federated learning ［EB/OL］. ［2023-09-12］. .
23	XIAO H， RASUL K， VOLLGRAF R. Fashion-MNIST： a novel image dataset for benchmarking machine learning algorithms ［EB/OL］. ［2023-11-10］. .
24	KRIZHEVSKY A. Learning multiple layers of features from tiny images ［R/OL］. ［2024-01-06］. .
25	DENG L. The MNIST database of handwritten digit images for machine learning research ［best of the web］［J］. IEEE Signal Processing Magazine， 2012， 29 （6）： 141-142.
26	GONG B， SHI Y， SHA F， et al. Geodesic flow kernel for unsupervised domain adaptation ［C］// Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2012： 2066-2073.
27	LI T， SAHU A K， ZAHEER M， et al. Federated optimization in heterogeneous networks ［EB/OL］. ［2023-10-10］. .
28	ZHU Z， HONG J， ZHOU J. Data-free knowledge distillation for heterogeneous federated learning ［C］// Proceedings of the 38th International Conference on Machine Learning. New York： JMLR.org， 2021： 12878-12889.

[1]	Yu WANG, Xianjin FANG, Gaoming YANG, Yifeng DING, Xinlu YANG. Active defense against face forgery based on attention mask and feature extraction [J]. Journal of Computer Applications, 2025, 45(3): 904-910.
[2]	Tianqi ZHANG, Shuang TAN, Xiwen SHEN, Juan TANG. Image watermarking method combining attention mechanism and multi-scale feature [J]. Journal of Computer Applications, 2025, 45(2): 616-623.
[3]	Qiurun HE, Jie HU, Bo PENG, Tianyuan LI. Fabric defect detection algorithm based on context information and multi-scale feature fusion [J]. Journal of Computer Applications, 2025, 45(2): 640-646.
[4]	Shufen ZHANG, Hongyang ZHANG, Zhiqiang REN, Xuebin CHEN. Survey of fairness in federated learning [J]. Journal of Computer Applications, 2025, 45(1): 1-14.
[5]	Zhuoyue OU, Xiuqin DENG, Lei CHEN. Self-adaptive multi-view clustering algorithm with complementarity based on weighted anchors [J]. Journal of Computer Applications, 2025, 45(1): 115-126.
[6]	Jietao LIANG, Bing LUO, Lanhui FU, Qingling CHANG, Nannan LI, Ningbo YI, Qi FENG, Xin HE, Fuqin DENG. Point cloud registration method based on coordinate geometric sampling [J]. Journal of Computer Applications, 2025, 45(1): 214-222.
[7]	Xin YANG, Xueni CHEN, Chunjiang WU, Shijie ZHOU. Short-term traffic flow prediction of urban highway based on variant residual model and Transformer [J]. Journal of Computer Applications, 2024, 44(9): 2947-2951.
[8]	Shunyong LI, Shiyi LI, Rui XU, Xingwang ZHAO. Incomplete multi-view clustering algorithm based on self-attention fusion [J]. Journal of Computer Applications, 2024, 44(9): 2696-2703.
[9]	Zheyuan SHEN, Keke YANG, Jing LI. Personalized federated learning method based on dual stream neural network [J]. Journal of Computer Applications, 2024, 44(8): 2319-2325.
[10]	Shuai FU, Xiaoying GUO, Ruyi BAI, Tao YAN, Bin CHEN. Age estimation method combining improved CloFormer model and ordinal regression [J]. Journal of Computer Applications, 2024, 44(8): 2372-2380.
[11]	Tong CHEN, Fengyu YANG, Yu XIONG, Hong YAN, Fuxing QIU. Construction method of voiceprint library based on multi-scale frequency-channel attention fusion [J]. Journal of Computer Applications, 2024, 44(8): 2407-2413.
[12]	Qing WANG, Jieyu ZHAO, Xulun YE, Nongxiao WANG. Enhanced deep subspace clustering method with unified framework [J]. Journal of Computer Applications, 2024, 44(7): 1995-2003.
[13]	Junchi GE, Weihua ZHAO. Distance weighted discriminant analysis based on robust principal component analysis for matrix data [J]. Journal of Computer Applications, 2024, 44(7): 2073-2079.
[14]	Wudan LONG, Bo PENG, Jie HU, Ying SHEN, Danni DING. Road damage detection algorithm based on enhanced feature extraction [J]. Journal of Computer Applications, 2024, 44(7): 2264-2270.
[15]	Ruihua LIU, Zihe HAO, Yangyang ZOU. Gait recognition algorithm based on multi-layer refined feature fusion [J]. Journal of Computer Applications, 2024, 44(7): 2250-2257.