《计算机应用》唯一官方网站 ›› 2025, Vol. 45 ›› Issue (4): 1086-1094.DOI: 10.11772/j.issn.1001-9081.2024010132
收稿日期:
2024-02-05
修回日期:
2024-04-04
接受日期:
2024-04-07
发布日期:
2024-05-09
出版日期:
2025-04-10
通讯作者:
陈庆礼
作者简介:
陈庆礼(1998—),男,河南新乡人,硕士研究生,主要研究方向:联邦学习
Qingli CHEN(), Yuanbo GUO, Chen FANG
Received:
2024-02-05
Revised:
2024-04-04
Accepted:
2024-04-07
Online:
2024-05-09
Published:
2025-04-10
Contact:
Qingli CHEN
About author:
CHEN Qingli, born in 1998, M. S. candidate. His research interests include federated learning.摘要:
联邦学习(FL)是一种在隐私保护和通信效率方面极具潜力的新型机器学习模型构建范式,然而现实物联网(IoT)场景中客户端节点数据之间会存在异构性,学习一个统一的全局模型会导致模型准确率下降。为了解决这一问题,提出一种基于特征分布的聚类联邦学习(CFLFD)算法。在该算法中,对每个客户端节点从模型提取的特征进行主成分分析(PCA)后所得到的结果进行聚类,以将具有相似数据分布的客户端节点聚类在一起相互协作,从而提高模型准确率。为验证算法的有效性,在3个数据集和4种基准算法上进行大量实验。实验结果表明,与FedProx相比,CFLFD算法在CIFAR10数据集和Office-Caltech10数据集上将模型准确率分别提升了1.12和3.76个百分点。
中图分类号:
陈庆礼, 郭渊博, 方晨. 面向数据异构的聚类联邦学习算法[J]. 计算机应用, 2025, 45(4): 1086-1094.
Qingli CHEN, Yuanbo GUO, Chen FANG. Clustering federated learning algorithm for heterogeneous data[J]. Journal of Computer Applications, 2025, 45(4): 1086-1094.
K | FMNIST | CIFAR10 | Rotated CIFAR10 | MNIST( |
---|---|---|---|---|
2 | 7.84 | 9.18 | 14.86 | 5.53 |
3 | 6.45 | 6.85 | 9.68 | 5.69 |
4 | 4.94 | 6.41 | 8.85 | 5.25 |
5 | 4.46 | 6.23 | 9.43 | 4.63 |
6 | 4.09 | 5.90 | 9.59 | 4.31 |
7 | 3.86 | 5.62 | 15.16 | 4.42 |
8 | 3.63 | 5.83 | 13.22 | 4.28 |
9 | 3.43 | 5.94 | 14.13 | 4.16 |
10 | 3.47 | 6.79 | 11.92 | 4.62 |
表1 不同K值下的Calinski-Harabasz指数值
Tab. 1 Calinski-Harabasz index values under different K values
K | FMNIST | CIFAR10 | Rotated CIFAR10 | MNIST( |
---|---|---|---|---|
2 | 7.84 | 9.18 | 14.86 | 5.53 |
3 | 6.45 | 6.85 | 9.68 | 5.69 |
4 | 4.94 | 6.41 | 8.85 | 5.25 |
5 | 4.46 | 6.23 | 9.43 | 4.63 |
6 | 4.09 | 5.90 | 9.59 | 4.31 |
7 | 3.86 | 5.62 | 15.16 | 4.42 |
8 | 3.63 | 5.83 | 13.22 | 4.28 |
9 | 3.43 | 5.94 | 14.13 | 4.16 |
10 | 3.47 | 6.79 | 11.92 | 4.62 |
数据类型 | Client ID | 数据类型 | Client ID |
---|---|---|---|
Amazon | 1,2,3,4,5 | Webcam | 11,12,13,14,15 |
DSLR | 6,7,8,9,10 | Caltech | 16,17,18,19,20 |
表2 Office-Caltech10数据集的客户端数据划分
Tab. 2 Client data division of Office-Caltech10 dataset
数据类型 | Client ID | 数据类型 | Client ID |
---|---|---|---|
Amazon | 1,2,3,4,5 | Webcam | 11,12,13,14,15 |
DSLR | 6,7,8,9,10 | Caltech | 16,17,18,19,20 |
算法 | FMNIST | CIFAR10 | Rotated CIFAR10 | MNIST( |
---|---|---|---|---|
Local | 82.13 | 46.03 | 26.30 | 98.23 |
FedAvg | 82.01 | 48.36 | 26.83 | 66.65 |
FedProx | 83.06 | 50.86 | 27.02 | 69.32 |
FedGen | 81.25 | 47.21 | 27.33 | 65.47 |
CFLFD | 83.61 | 51.98 | 27.89 | 70.12 |
表3 不同算法的准确率对比 (%)
Tab. 3 Accuracy comparison of different algorithms
算法 | FMNIST | CIFAR10 | Rotated CIFAR10 | MNIST( |
---|---|---|---|---|
Local | 82.13 | 46.03 | 26.30 | 98.23 |
FedAvg | 82.01 | 48.36 | 26.83 | 66.65 |
FedProx | 83.06 | 50.86 | 27.02 | 69.32 |
FedGen | 81.25 | 47.21 | 27.33 | 65.47 |
CFLFD | 83.61 | 51.98 | 27.89 | 70.12 |
算法 | Amazon | DSLR | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
client1 | client2 | client3 | client4 | client5 | client6 | client7 | client8 | client9 | client10 | |
Local Tai | 61.66 | 63.75 | 64.16 | 62.50 | 62.91 | 72.50 | 67.50 | 62.50 | 65.00 | 67.50 |
FedAvg | 65.83 | 65.83 | 63.33 | 64.16 | 64.16 | 62.50 | 65.00 | 62.50 | 62.50 | 67.50 |
FedProx | 62.08 | 62.50 | 61.66 | 63.33 | 62.50 | 70.00 | 67.50 | 67.50 | 67.50 | 67.50 |
FedGen | 64.16 | 62.91 | 62.50 | 63.75 | 63.75 | 67.50 | 67.50 | 62.50 | 67.50 | 65.00 |
CFLFD | 66.66 | 68.33 | 68.33 | 67.50 | 66.25 | 72.50 | 75.00 | 72.50 | 72.50 | 75.00 |
算法 | Webcam | Caltech | ||||||||
client11 | client12 | client13 | client14 | client15 | client16 | client17 | client18 | client19 | client20 | |
Local | 67.56 | 68.91 | 72.97 | 70.27 | 67.56 | 41.99 | 44.83 | 41.28 | 40.21 | 37.36 |
FedAvg | 68.91 | 66.21 | 68.91 | 67.56 | 70.27 | 44.48 | 44.48 | 45.19 | 45.19 | 44.48 |
FedProx | 78.37 | 78.37 | 79.72 | 82.43 | 77.02 | 42.70 | 44.12 | 41.28 | 42.70 | 41.99 |
FedGen | 75.67 | 78.37 | 77.02 | 77.02 | 74.32 | 37.36 | 36.29 | 38.07 | 39.14 | 37.72 |
CFLFD | 79.72 | 79.72 | 81.08 | 83.78 | 82.43 | 44.83 | 45.19 | 44.83 | 45.55 | 44.12 |
表4 Office-Caltech10数据集上不同客户端节点的准确率对比 (%)
Tab. 4 Accuracy comparison of different client nodes on Office-Caltech10 dataset
算法 | Amazon | DSLR | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
client1 | client2 | client3 | client4 | client5 | client6 | client7 | client8 | client9 | client10 | |
Local Tai | 61.66 | 63.75 | 64.16 | 62.50 | 62.91 | 72.50 | 67.50 | 62.50 | 65.00 | 67.50 |
FedAvg | 65.83 | 65.83 | 63.33 | 64.16 | 64.16 | 62.50 | 65.00 | 62.50 | 62.50 | 67.50 |
FedProx | 62.08 | 62.50 | 61.66 | 63.33 | 62.50 | 70.00 | 67.50 | 67.50 | 67.50 | 67.50 |
FedGen | 64.16 | 62.91 | 62.50 | 63.75 | 63.75 | 67.50 | 67.50 | 62.50 | 67.50 | 65.00 |
CFLFD | 66.66 | 68.33 | 68.33 | 67.50 | 66.25 | 72.50 | 75.00 | 72.50 | 72.50 | 75.00 |
算法 | Webcam | Caltech | ||||||||
client11 | client12 | client13 | client14 | client15 | client16 | client17 | client18 | client19 | client20 | |
Local | 67.56 | 68.91 | 72.97 | 70.27 | 67.56 | 41.99 | 44.83 | 41.28 | 40.21 | 37.36 |
FedAvg | 68.91 | 66.21 | 68.91 | 67.56 | 70.27 | 44.48 | 44.48 | 45.19 | 45.19 | 44.48 |
FedProx | 78.37 | 78.37 | 79.72 | 82.43 | 77.02 | 42.70 | 44.12 | 41.28 | 42.70 | 41.99 |
FedGen | 75.67 | 78.37 | 77.02 | 77.02 | 74.32 | 37.36 | 36.29 | 38.07 | 39.14 | 37.72 |
CFLFD | 79.72 | 79.72 | 81.08 | 83.78 | 82.43 | 44.83 | 45.19 | 44.83 | 45.55 | 44.12 |
算法 | Amazon | DSLR | Webcam | Caltech | 平均 |
---|---|---|---|---|---|
Local | 62.99 | 67.00 | 69.45 | 41.13 | 60.14 |
FedAvg | 64.66 | 64.00 | 68.37 | 44.76 | 60.44 |
FedProx | 62.41 | 68.00 | 79.18 | 42.55 | 63.03 |
FedGen | 63.41 | 66.00 | 76.48 | 37.71 | 60.90 |
CFLFD | 67.41 | 73.50 | 81.34 | 44.90 | 66.79 |
表5 不同数据域上的平均准确率 (%)
Tab. 5 Average accuracies on of different data domains
算法 | Amazon | DSLR | Webcam | Caltech | 平均 |
---|---|---|---|---|---|
Local | 62.99 | 67.00 | 69.45 | 41.13 | 60.14 |
FedAvg | 64.66 | 64.00 | 68.37 | 44.76 | 60.44 |
FedProx | 62.41 | 68.00 | 79.18 | 42.55 | 63.03 |
FedGen | 63.41 | 66.00 | 76.48 | 37.71 | 60.90 |
CFLFD | 67.41 | 73.50 | 81.34 | 44.90 | 66.79 |
图5 不同域中FedAvg、FedProx、FedGen和本文算法相较于Local Training算法的增益
Fig. 5 Improvements of FedAvg, FedProx, FedGen, and the proposed algorithms compared to Local Training algorithm in different domains
1 | RUSSELL S J, NORVIG P. Artificial intelligence a modern approach [M]. 4th ed. Hoboken, NJ: Pearson Education, Inc., 2021. |
2 | ZHANG Q, CHENG L, BOUTABA R. Cloud computing: state-of-the-art and research challenges [J]. Journal of Internet Services and Applications, 2010, 1: 7-18. |
3 | ZHU Q, WANG R, CHEN Q, et al. IOT gateway: BridgingWireless sensor networks into internet of things [C]// Proceedings of the 2010 IEEE/IFIP International Conference on Embedded and Ubiquitous Computing. Piscataway: IEEE, 2010: 347-352. |
4 | YIN C, XIONG Z, CHEN H, et al. A literature survey on smart cities [J]. SCIENCE CHINA Information Sciences, 2015, 58 (10): No.100102. |
5 | LIU Y, YU J J Q, KANG J, et al. Privacy-preserving traffic flow prediction: a federated learning approach [J]. IEEE Internet of Things Journal, 2020, 7 (8): 7751-7763. |
6 | 杨强. 联邦学习: 人工智能的最后一公里 [J]. 智能系统学报, 2020, 15 (1): 183-186. |
YANG Q. Federated learning: the last on kilometer of artificial intelligence [J]. CAAI Transactions on Intelligent Systems, 2020, 15 (1): 183-186. | |
7 | KANG J, XIONG Z, NIYATO D, et al. Reliable federated learning for mobile networks [J]. IEEE Wireless Communications, 2020, 27 (2): 72-80. |
8 | 杨强. AI 与数据隐私保护: 联邦学习的破解之道 [J]. 信息安全研究, 2019, 5 (11): 961-965. |
YANG Q. AI and data privacy protection: the way to federated learning [J]. Journal of Information Security Research, 2019, 5 (11): 961-965. | |
9 | KHAN L U, YAQOOB I, TRAN N H, et al. Edge-computing-enabled smart cities: a comprehensive survey [J]. IEEE Internet of Things Journal, 2020, 7 (10): 10200-10232. |
10 | VOIGT P, VON DEM BUSSCHE A. The EU General Data Protection Regulation (GDPR): a practical guide [M]. 2nd ed. Cham: Springer, 2024. |
11 | DEAN J, CORRADO G S, MONGA R, et al. Large scale distributed deep networks [C]// Proceedings of the 25th International Conference on Neural Information Processing Systems — Volume 1. Red Hook: Curran Associates Inc., 2012: 1223-1231. |
12 | McMAHAN H B, MOORE E, RAMAGE D, et al. Communication-efficient learning of deep networks from decentralized data [C]// Proceedings of the 20th Artificial Intelligence and Statistics. New York: JMLR.org, 2017: 1273-1282. |
13 | WANG K, MATHEWS R, KIDDON C, et al. Federated evaluation of on-device personalization [EB/OL]. [2023-10-10]. . |
14 | ACAR D A E, ZHAO Y, NAVARRO R M, et al. Federated learning based on dynamic regularization [EB/OL]. [2023-10-12]. . |
15 | YU T, BAGDASARYAN E, SHMATIKOV V. Salvaging federated learning by local adaptation [EB/OL]. [2023-08-10]. . |
16 | HINTON G, VINYALS O, DEAN J. Distilling the knowledge in a neural network [EB/OL]. [2023-08-15]. . |
17 | KARIMIREDDY S P, KALE S, MOHRI M, et al. SCAFFOLD: stochastic controlled averaging for federated learning [C]// Proceedings of the 37th International Conference on Machine Learning. New York: JMLR.org, 2020: 5132-5143. |
18 | WANG J, LIU Q, LIANG H, et al. Tackling the objective inconsistency problem in heterogeneous federated optimization [C]// Proceedings of the 34th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2020: 7611-7623. |
19 | LI T, HU S, BEIRAMI A, et al. Ditto: fair and robust federated learning through personalization [C]// Proceedings of the 38th International Conference on Machine Learning. New York: JMLR.org, 2021: 6357-6368. |
20 | FALLAH A, MOKHTARI A, OZDAGLAR A. Personalized federated learning with theoretical guarantees: a model-agnostic meta-learning approach [C]// Proceedings of the 34th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2020: 3557-3568. |
21 | LI X, JIANG M, ZHANG X, et al. FedBN: federated learning on non-IID features via local batch normalization [EB/OL]. [2023-09-02]. . |
22 | DENG Y, KAMANI M M, MAHDAVI M. Adaptive personalized federated learning [EB/OL]. [2023-09-12]. . |
23 | XIAO H, RASUL K, VOLLGRAF R. Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms [EB/OL]. [2023-11-10]. . |
24 | KRIZHEVSKY A. Learning multiple layers of features from tiny images [R/OL]. [2024-01-06]. . |
25 | DENG L. The MNIST database of handwritten digit images for machine learning research [best of the web] [J]. IEEE Signal Processing Magazine, 2012, 29 (6): 141-142. |
26 | GONG B, SHI Y, SHA F, et al. Geodesic flow kernel for unsupervised domain adaptation [C]// Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2012: 2066-2073. |
27 | LI T, SAHU A K, ZAHEER M, et al. Federated optimization in heterogeneous networks [EB/OL]. [2023-10-10]. . |
28 | ZHU Z, HONG J, ZHOU J. Data-free knowledge distillation for heterogeneous federated learning [C]// Proceedings of the 38th International Conference on Machine Learning. New York: JMLR.org, 2021: 12878-12889. |
[1] | 项钰斐, 倪郑威. 基于演化博弈的分层联邦学习边缘联合动态分析[J]. 《计算机应用》唯一官方网站, 2025, 45(4): 1077-1085. |
[2] | 曾辉, 熊诗雨, 狄永正, 史红周. 基于剪枝的大模型联邦参数高效微调技术[J]. 《计算机应用》唯一官方网站, 2025, 45(3): 715-724. |
[3] | 王瑜, 方贤进, 杨高明, 丁一峰, 杨新露. 基于注意力掩码与特征提取的人脸伪造主动防御[J]. 《计算机应用》唯一官方网站, 2025, 45(3): 904-910. |
[4] | 林海力, 李京. 基于工作证明的联邦学习懒惰客户端识别方法[J]. 《计算机应用》唯一官方网站, 2025, 45(3): 856-863. |
[5] | 张天骐, 谭霜, 沈夕文, 唐娟. 融合注意力机制和多尺度特征的图像水印方法[J]. 《计算机应用》唯一官方网站, 2025, 45(2): 616-623. |
[6] | 徐超, 张淑芬, 陈海田, 彭璐璐, 张帅华. 基于自适应差分隐私与客户选择优化的联邦学习方法[J]. 《计算机应用》唯一官方网站, 2025, 45(2): 482-489. |
[7] | 王心妍, 杜嘉程, 钟李红, 徐旺旺, 刘伯宇, 佘维. 融合电力数据的纵向联邦学习企业排污预测模型[J]. 《计算机应用》唯一官方网站, 2025, 45(2): 518-525. |
[8] | 陈海田, 陈学斌, 马锐奎, 张帅华. 面向遥感数据的基于本地差分隐私的联邦学习隐私保护方案[J]. 《计算机应用》唯一官方网站, 2025, 45(2): 506-517. |
[9] | 任志强, 陈学斌. 基于历史模型更新的自适应防御机制FedAud[J]. 《计算机应用》唯一官方网站, 2025, 45(2): 490-496. |
[10] | 何秋润, 胡节, 彭博, 李天源. 基于上下文信息的多尺度特征融合织物疵点检测算法[J]. 《计算机应用》唯一官方网站, 2025, 45(2): 640-646. |
[11] | 朱亮, 慕京哲, 左洪强, 谷晶中, 朱付保. 基于联邦图神经网络的位置隐私保护推荐方案[J]. 《计算机应用》唯一官方网站, 2025, 45(1): 136-143. |
[12] | 梁杰涛, 罗兵, 付兰慧, 常青玲, 李楠楠, 易宁波, 冯其, 何鑫, 邓辅秦. 基于坐标几何采样的点云配准方法[J]. 《计算机应用》唯一官方网站, 2025, 45(1): 214-222. |
[13] | 晏燕, 钱星颖, 闫鹏斌, 杨杰. 位置大数据的联邦学习统计预测与差分隐私保护方法[J]. 《计算机应用》唯一官方网站, 2025, 45(1): 127-135. |
[14] | 区卓越, 邓秀勤, 陈磊. 基于加权锚点的自适应多视图互补聚类算法[J]. 《计算机应用》唯一官方网站, 2025, 45(1): 115-126. |
[15] | 张淑芬, 张宏扬, 任志强, 陈学斌. 联邦学习的公平性综述[J]. 《计算机应用》唯一官方网站, 2025, 45(1): 1-14. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||