双端聚类的自动调整聚类联邦学习

doi:10.11772/j.issn.1001-9081.2023101475

《计算机应用》唯一官方网站 ›› 2024, Vol. 44 ›› Issue (10): 3011-3020.DOI: 10.11772/j.issn.1001-9081.2023101475

双端聚类的自动调整聚类联邦学习

尹春勇¹(), 周永成²

^1.南京信息工程大学计算机学院、网络空间安全学院，南京 210044
^2.南京信息工程大学软件学院，南京 210044

收稿日期:2023-11-02 修回日期:2024-02-23 接受日期:2024-02-26 发布日期:2024-10-15 出版日期:2024-10-10
通讯作者: 尹春勇
作者简介:尹春勇（1977—），男，山东潍坊人，教授，博士生导师，博士，主要研究方向：网络空间安全、大数据挖掘及隐私保护、人工智能及新型计算 yinchunyong@hotmail.com
周永成（2000—），男，江苏建湖人，硕士研究生，主要研究方向：联邦学习。
基金资助:
国家自然科学基金资助项目(6177282)

Automatically adjusted clustered federated learning for double-ended clustering

Chunyong YIN¹(), Yongcheng ZHOU²

^1.School of Computer Science，School of Cyberspace Security，Nanjing University of Information Science and Technology，Nanjing Jiangsu 210044，China
^2.School of Software，Nanjing University of Information Science and Technology，Nanjing Jiangsu 210044，China

Received:2023-11-02 Revised:2024-02-23 Accepted:2024-02-26 Online:2024-10-15 Published:2024-10-10
Contact: Chunyong YIN
About author:ZHOU Yongcheng， born in 2000， M. S. candidate. His research interests include federated learning.
Supported by:
National Natural Science Foundation of China(6177282)

摘要/Abstract

摘要：

联邦学习（FL）是一种分布式机器学习方法，旨在共同训练全局模型，然而全局模型难以胜任多数据分布情况。为应对多分布挑战，引入聚类联邦学习，以客户端分组方式优化共享多模型。其中，服务器端聚类难以修正分类错误，而客户端聚类则对初始模型的选择至关重要。为解决这些问题，提出自动调整聚类联邦学习（AACFL）框架，所提框架采用双端聚类整合服务器端和客户端聚类。首先用双端聚类将客户端分为可调整集群，其次自动调整局部客户端身份，最后获取正确的客户集群。在非独立同分布下，在3个经典联邦数据集上的评估实验结果表明，AACFL能够在双端聚类结果存在错误的情况下通过调整获得正确集群，当簇数为4，客户端数为100时，与联邦平均（FedAvg）算法、聚类联邦学习（CFL）和IFCA（Iterative Federated Clustering Algorithm）等方法相比，有效地提高模型收敛速度和获得正确聚类结果的速度，准确率平均提升0.20~23.16个百分点。验证了所提框架能够高效聚类，并提高模型收敛速度和准确率。

关键词: 联邦学习, 聚类, 异构数据, 分布式机器学习, 神经网络

Abstract:

Federated Learning （FL） is a distributed machine learning method that aims to jointly train a global model， but the global model is difficult to handle multi-data distribution situations. To deal with the multi-distribution challenge， clustered federated learning was introduced to optimize shared multiple models in a client grouping manner. Among them， server-side clustering was difficult to correct classification errors， while client-side clustering was crucial to the selection of the initial model. To solve these problems， an Automatically Adjusted Clustered Federated Learning （AACFL） framework was proposed， which used double-ended clustering to integrate server-side and client-side clustering. Firstly， double-ended clustering was used to divide client ends into adjustable clusters. Then， local client end identities were adjusted automatically. Finally， the correct client clusters were obtained. AACFL was evaluated on three classical federated datasets under non-independent and identically distributed conditions. Experimental results show that AACFL can obtain correct clusters through adjustment when there are errors in the double-ended clustering results. Compared with FedAvg （Federated Averaging） algorithm， CFL （Clustered Federated Learning）， IFCA （Iterative Federated Clustering Algorithm） and other methods， AACFL can effectively improve the model convergence speed and the speed of obtaining correct clustering results， and has the accuracy improved by 0.20-23.16 percentage points on average with the number of clusters is 4 and the number of clients is 100. Therefore， the proposed framework can cluster efficiently and improve model convergence speed and accuracy.

Key words: Federated Learning (FL), clustering, heterogeneous data, distributed machine learning, neural network

中图分类号:

TP181

尹春勇, 周永成. 双端聚类的自动调整聚类联邦学习[J]. 计算机应用, 2024, 44(10): 3011-3020.

Chunyong YIN, Yongcheng ZHOU. Automatically adjusted clustered federated learning for double-ended clustering[J]. Journal of Computer Applications, 2024, 44(10): 3011-3020.

图/表 13

图1 AACFL整体框架

Fig. 1 Overall framework of AACFL

表1 各算法在k=2，n=50下的准确率评估 (%)

Tab. 1 Accuracy evaluation of different algorithms under k=2， n=50

算法	EMNIST	CIFAR-10	Fashion-MNIST
FedAvg	70.18	48.93	81.46
FeSEM	78.24	55.41	84.80
WeCFL	78.46	55.76	85.10
IFCA	79.15	57.78	85.79
CFL	79.14	55.57	84.82
AACFL	79.20	56.85	86.13

表2 各算法在k=4， n=100下的准确率评估 (%)

Tab. 2 Accuracy evaluation of different algorithms under k=4， n=100

算法	EMNIST	CIFAR-10	Fashion-MNIST
FedAvg	56.19	42.62	72.82
FeSEM	74.20	53.12	82.13
WeCFL	74.87	53.68	82.68
IFCA	78.97	53.20	85.64
CFL	78.87	56.67	85.53
AACFL	79.35	58.29	85.84

图2 4种算法在3个数据集上的准确率

Fig. 2 Accuracy of four algorithms on three datasets

图3 3个数据集上客户端参与率q对AACFL性能的影响

Fig. 3 Influence of client participation rates q on AACFL performance on three datasets

图4 3个数据集是聚类轮数阈值κ对AACFL性能的影响

Fig. 4 Influence of clustering round thresholds κ on AACFL performance on three datasets

表3 AACFL在不同学习率下的准确率评估

Tab. 3 Accuracy evaluation of AACFL under different learning rates

学习率	准确率/%
学习率	EMNIST	CIFAR-10	Fashion-MNIST
0.10	78.28	57.44	86.76
0.07	78.12	57.34	86.06
0.04	77.26	56.90	84.46
0.01	72.89	50.49	80.46

图5 不同算法在3个数据集上的ARI变化过程

Fig. 5 ARI changes of different algorithms on three datasets

表4 AACFL在不同聚类客户端参与率下ARI评估和轮数消耗评估

Tab. 4 ARI evaluation and round consumption evaluation of AACFL with different clustered client participation rates

q_κ	EMNIST		CIFAR-10		Fashion-MNIST
q_κ	ARI	times	ARI	times	ARI	times
0.4	0.82	32	0.79	12	0.94	10
0.6	0.92	29	0.89	16	0.94	4
0.8	0.97	9	0.87	20	0.94	7
1.0	0.97	11	0.97	5	0.97	3

图6 CIFAR-10上AACFL在不同v值下的准确率、ARI和CN变化曲线

Fig. 6 AACFL’s accuracy， ARI and CN change curves under different v values on CIFAR-10

图7 CIFAR-10上，AACFL在不同g值下的准确率、ARI和CN变化曲线

Fig. 7 AACFL’s accuracy， ARI and CN change curves under different g values on CIFAR-10

表5 各算法在k=2， n=50下的时间消耗 (s)

Tab. 5 Time consumption of different algorithms under k=2， n=50

算法	EMNIST	CIFAR-10	Fashion-MNIST
FedAvg	2 146	5 415	2 812
FeSEM	2 567	5 897	3 125
WeCFL	2 583	5 812	3 217
IFCA	4 054	8 240	4 545
CFL	2 348	5 755	3 064
AACFL	2 787	5 990	3 364

表6 各算法在k=4， n=100下的时间消耗 (s)

Tab. 6 Time consumption of different algorithms under k=4， n=100

算法	EMNIST	CIFAR-10	Fashion-MNIST
FedAvg	5 116	9 214	5 042
FeSEM	5 623	9 845	5 543
WeCFL	5 711	9 834	5 604
IFCA	7 132	14 564	7 465
CFL	5 521	9 664	5 402
AACFL	5 831	10 168	5 799

参考文献 39

1	McMAHAN H B， MOORE E， RAMAGE D， et al. Communication-efficient learning of deep networks from decentralized data［C］// Proceedings of the 20th International Conference on Artificial Intelligence and Statistics. New York： JMLR.org， 2017： 1273-1282.
2	LI X， HUANG K X， YANG W H， et al. On the convergence of FedAvg on Non-IID data［EB/OL］. （2020-06-25）［2023-09-12］..
3	YANG H， FANG M， LIU J. Achieving linear speedup with partial worker participation in Non-IID federated learning［EB/OL］. （2021-05-04）［2023-09-12］..
4	DUAN M， LIU D， CHEN X， et al. Astraea： self-balancing federated learning for improving classification accuracy of mobile deep learning applications［C］// Proceedings of the 2019 IEEE 37th International Conference on Computer Design. Piscataway： IEEE， 2019： 246-254.
5	罗长银，陈学斌，刘洋，等. 基于联邦集成算法对多源数据安全性的研究［J］. 计算机工程与科学， 2021， 43（8）：1387-1397.
	LUO C Y， CHEN X B， LIU Y， et al. A federated ensemble algorithm for multi-source data security［J］. Computer Engineering and Science， 2021， 43（8）：1387-1397.
6	HAO M， LI H， XU G， et al. Towards efficient and privacy-preserving federated deep learning［C］// Proceedings of the 2019 IEEE International Conference on Communications. Piscataway： IEEE， 2019： 1-6.
7	FANG C， GUO Y， WANG N， et al. Highly efficient federated learning with strong privacy preservation in cloud computing［J］. Computers and Security， 2020， 96： No.101889.
8	LI Q， WEN Z， WU Z， et al. A survey on federated learning systems： vision， hype and reality for data privacy and protection［J］. IEEE Transactions on Knowledge and Data Engineering， 2021， 35（4）： 3347-3366.
9	HARD A， RAO K， MATHEWS R， et al. Federated learning for mobile keyboard prediction［EB/OL］. （2019-02-28）［2023-09-12］..
10	BRISIMI T S， CHEN R， MELA T， et al. Federated learning of predictive models from federated electronic health records［J］. International Journal of Medical Informatics， 2018， 112： 59-67.
11	LI T， SAHU A K， TALWALKAR A， et al. Federated learning： challenges， methods， and future directions［J］. IEEE Signal Processing Magazine， 2020， 37（3）： 50-60.
12	JIN H， BAI D， YAO D， et al. Personalized edge intelligence via federated self-knowledge distillation［J］. IEEE Transactions on Parallel and Distributed Systems， 2023， 34（2）： 567-580.
13	LI H， CAI Z， WANG J， et al. FedTP： federated learning by Transformer personalization［EB/OL］. （2023-04-18）［2023-09-12］..
14	SATTLER F， MÜLLER K R， SAMEK W. Clustered federated learning： model-agnostic distributed multitask optimization under privacy constraints［J］. IEEE Transactions on Neural Networks and Learning Systems， 2021， 32（8）： 3710-3722.
15	SATTLER F， MÜLLER K R， WIEGAND T， et al. On the Byzantine robustness of clustered federated learning［C］// Proceedings of the 2020 IEEE International Conference on Acoustics， Speech and Signal Processing. Piscataway： IEEE， 2020： 8861-8865.
16	DENNIS D K， LI T， SMITH V. Heterogeneity for the win： one-shot federated clustering［C］// Proceedings of the 38th International Conference on Machine Learning. New York： JMLR.org， 2021： 2611-2620.
17	GHOSH A， CHUNG J， YIN D， et al. An efficient framework for clustered federated learning［J］. IEEE Transactions on Information Theory， 2022， 68（12）： 8076-8091.
18	BRIGGS C， FAN Z， ANDRAS P. Federated learning with hierarchical clustering of local updates to improve training on non-IID data［C］// Proceedings of the 2020 International Joint Conference on Neural Networks. Piscataway： IEEE， 2020： 1-9.
19	LI C， LI G， VARSHNEY P K. Federated learning with soft clustering［J］. IEEE Internet of Things Journal， 2022， 9（10）： 7773-7782.
20	DUAN M， LIU D， CHEN X， et al. Self-balancing federated learning with global imbalanced data in mobile systems［J］. IEEE Transactions on Parallel and Distributed Systems， 2021， 32（1）： 59-71.
21	LI T， SAHU A K， ZAHEER M， et al. Federated optimization in heterogeneous networks［EB/OL］. ［2023-09-12］..
22	MOTHUKURI V， PARIZI R M， POURIYEH S， et al. A survey on security and privacy of federated learning［J］. Future Generation Computer Systems， 2021， 115： 619-640.
23	ZHAO Y， LI M， LAI L， et al. Federated learning with non-IID data［EB/OL］. （2022-07-21）［2023-09-12］..
24	LU R， ZHANG W， WANG Y， et al. Auction-based cluster federated learning in mobile edge computing systems［J］. IEEE Transactions on Parallel and Distributed Systems， 2023， 34（4）： 1145-1158.
25	DUAN M， LIU D， JI X， et al. Flexible clustered federated learning for client-level data distribution shift［J］. IEEE Transactions on Parallel and Distributed Systems， 2022， 33（11）： 2661-2674.
26	ZHANG Y， LIU D， DUAN M， et al. FedMDS： an efficient model discrepancy-aware semi-asynchronous clustered federated learning framework［J］. IEEE Transactions on Parallel and Distributed Systems， 2023， 34（3）： 1007-1019.
27	LLOYD S. Least squares quantization in PCM［J］. IEEE Transactions on Information Theory， 1982， 28（2）： 129-137.
28	RUAN Y， JOE-WONG C. FedSoft： soft clustered federated learning with proximal local updating［C］// Proceedings of the 36th AAAI Conference on Artificial Intelligence. Palo Alto： AAAI Press， 2022： 8124-8131.
29	TIAN P， LIAO W X， YU W， et al. WSCC： a weight-similarity-based client clustering approach for non-IID federated learning［J］. IEEE Internet of Things Journal， 2022， 9（20）： 20243-20256.
30	LONG G， XIE M， SHEN T， et al. Multi-center federated learning： clients clustering for better personalization［J］. World Wide Web， 2023， 26： 481-500.
31	AGRAWAL S， SARKAR S， ALAZAB M， et al. Genetic CFL： hyperparameter optimization in clustered federated learning［J］. Computational Intelligence and Neuroscience， 2021， 2021： No.7156420.
32	鲁晨阳，邓苏，马武彬，等. 基于DBSCAN聚类的集群联邦学习方法［J］. 计算机科学， 2022， 49（6A）：232-237.
	LU C Y， DENG S， MA W B， et al. Clustered federated learning methods based on DBSCAN clustering［J］. Computer Science， 2022， 49（6A）：232-237.
33	常黎明，刘颜红，徐恕贞. 基于数据分布的聚类联邦学习［J］. 计算机应用研究， 2023， 40（6）：1697-1701.
	CHANG L M， LIU Y H， XU S Z. Clustering federated learning based on data distribution［J］. Application Research of Computers， 2023， 40（6）：1697-1701.
34	STALLMANN M， WILBIK A. Towards federated clustering： a Federated Fuzzy c-Means algorithm （FFCM）［EB/OL］. （2022-01-18）［2023-09-12］..
35	XIE H， MA J， XIONG L， et al. Federated graph classification over non-IID graphs［C］// Proceedings of the 35th Conference on Neural Information Processing Systems. New York： ACM， 2024： 18839-18852.
36	COHEN G， AFSHAR S， TAPSON J， et al. EMNIST： extending MNIST to handwritten letters［C］// Proceedings of the 2017 International Joint Conference on Neural Networks. Piscataway： IEEE， 2017： 2921-2926.
37	KRIZHEVSKY A. Learning multiple layers of features from tiny images［D］. Toronto： University of Toronto， 2009： 1-60.
38	XIAO H， RASUL K， VOLLGRAF R. Fashion-MNIST： a novel image dataset for benchmarking machine learning algorithms［EB/OL］. （2017-09-15）［2023-09-12］..
39	MA J， LONG G， ZHOU T， et al. On the convergence of clustered federated learning［EB/OL］. （2022-06-07）［2023-09-12］..

[1]	陈廷伟, 张嘉诚, 王俊陆. 面向联邦学习的随机验证区块链构建[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2770-2776.
[2]	王娜, 蒋林, 李远成, 朱筠. 基于图形重写和融合探索的张量虚拟机算符融合优化[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2802-2809.
[3]	李云, 王富铕, 井佩光, 王粟, 肖澳. 基于不确定度感知的帧关联短视频事件检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2903-2910.
[4]	唐廷杰, 黄佳进, 秦进. 基于图辅助学习的会话推荐[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2711-2718.
[5]	张睿, 张鹏云, 高美蓉. 自优化双模态多通路非深度前庭神经鞘瘤识别模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2975-2982.
[6]	秦璟, 秦志光, 李发礼, 彭悦恒. 基于概率稀疏自注意力神经网络的重性抑郁疾患诊断[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2970-2974.
[7]	方介泼, 陶重犇. 应对零日攻击的混合车联网入侵检测系统[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2763-2769.
[8]	杨航, 李汪根, 张根生, 王志格, 开新. 基于图神经网络的多层信息交互融合算法用于会话推荐[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2719-2725.
[9]	杨兴耀, 陈羽, 于炯, 张祖莲, 陈嘉颖, 王东晓. 结合自我特征和对比学习的推荐模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2704-2710.
[10]	李顺勇, 李师毅, 胥瑞, 赵兴旺. 基于自注意力融合的不完整多视图聚类算法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2696-2703.
[11]	姚光磊, 熊菊霞, 杨国武. 基于神经网络优化的花朵授粉算法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2829-2837.
[12]	黄颖, 杨佳宇, 金家昊, 万邦睿. 用于RGBT跟踪的孪生混合信息融合算法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2878-2885.
[13]	杜郁, 朱焱. 构建预训练动态图神经网络预测学术合作行为消失[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2726-2731.
[14]	赵宇博, 张丽萍, 闫盛, 侯敏, 高茂. 基于改进分段卷积神经网络和知识蒸馏的学科知识实体间关系抽取[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2421-2429.
[15]	陈虹, 齐兵, 金海波, 武聪, 张立昂. 融合1D-CNN与BiGRU的类不平衡流量异常检测[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2493-2499.

双端聚类的自动调整聚类联邦学习

Automatically adjusted clustered federated learning for double-ended clustering

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 13

参考文献 39

相关文章 15

编辑推荐

Metrics