Survey of statistical heterogeneity in federated learning

doi:10.11772/j.issn.1001-9081.2024091316

Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (9): 2737-2746.DOI: 10.11772/j.issn.1001-9081.2024091316

• Artificial intelligence • Previous Articles

Survey of statistical heterogeneity in federated learning

Hao YU¹^,²^,³, Jing FAN¹^,²^,³(), Yihang SUN¹^,²^,³, Hua DONG¹^,²^,³, Enkang XI¹^,²^,³

^1.School of Electrical and Information Technology，Yunnan Minzu University，Kunming Yunnan 650504，China
^2.Yunnan Key Laboratory of Unmanned Autonomous System （Yunnan Minzu University），Kunming Yunnan 650504，China
^3.Key Laboratory of Information and Communication Security and Disaster Recovery in Universities of Yunnan Province （Yunnan Minzu University），Kunming Yunnan 650504，China

Received:2024-09-18 Revised:2024-12-09 Accepted:2024-12-10 Online:2025-01-13 Published:2025-09-10
Contact: Jing FAN
About author:YU Hao， born in 2000， M. S. candidate. His research interests include federated learning， distributed optimization， edge computing.
SUN Yihang， born in 2001， M. S. candidate. His research interests include federated learning， privacy security.
DONG Hua， born in 2001， M. S. candidate. His research interests include federated learning， deep learning.
XI Enkang， born in 2000， M. S. candidate. His research interests include federated learning， privacy security.
Supported by:
National Natural Science Foundation of China(61540063);Youth Fund for Humanities and Social Sciences Research of Ministry of Education(20YJCZH129);Wu Zhonghai Expert Workstation Project(202305AF150045);Scientific Research Foundation of Education Department of Yunnan Province(2025Y0670);Yunnan Minzu University Master’s Research and Innovation Fund(2022SKY004)

联邦学习统计异质性综述

俞浩¹^,²^,³, 范菁¹^,²^,³(), 孙伊航¹^,²^,³, 董华¹^,²^,³, 郗恩康¹^,²^,³

^1.云南民族大学电气信息工程学院，昆明 650504
^2.云南省无人自主系统重点实验室（云南民族大学），昆明 650504
^3.云南省高校信息与通信安全灾备重点实验室（云南民族大学），昆明 650504

通讯作者: 范菁
作者简介:俞浩（2000—），男，湖北咸宁人，硕士研究生，CCF会员，主要研究方向：联邦学习、分布式优化、边缘计算
孙伊航（2001—），男（回族），河南许昌人，硕士研究生，主要研究方向：联邦学习、隐私安全
董华（2001—），男，山西运城人，硕士研究生，主要研究方向：联邦学习、深度学习
郗恩康（2000—），男，山东枣庄人，硕士研究生，主要研究方向：联邦学习、隐私安全。
基金资助:
国家自然科学基金资助项目(61540063);教育部人文社会科学研究青年基金资助项目(20YJCZH129);云南省吴中海专家工作站项目(202305AF150045);云南省教育厅科学研究基金项目资助(2025Y0670);云南民族大学硕士研究生科研创新基金资助项目(2022SKY004)

Abstract

Abstract:

Federated learning is a distributed machine learning framework that emphasizes privacy protection. However， it faces significant challenges in addressing statistical heterogeneity. Statistical heterogeneity is come from differences in data distribution across participating nodes， which may lead to problems such as model update biases， performance degradation of the global model， and instability in convergence. Aiming at the above problems， firstly， main issues caused by statistical heterogeneity were analyzed in detail， including inconsistent feature distributions， imbalanced label distributions， asymmetrical data sizes， and varying data quality. Secondly， a systematic review of the existing solutions of statistical heterogeneity in federated learning was provided， including local correction， clustering methods， client selection optimization， aggregation strategy adjustments， data sharing， knowledge distillation， and decoupling optimization， with an evaluation of their advantages， disadvantages， and applicable scenarios. Finally， future related research directions were discussed， such as device computing capacity awareness， model heterogeneity adaptation， optimization of privacy security mechanisms， and enhancement of cross-task transferability， thereby providing references for addressing statistical heterogeneity in practical applications.

Key words: federated learning, statistical heterogeneity, client drift, distributed learning, non-Independent and Identically Distributed (non-IID)

摘要：

联邦学习是一种强调隐私保护的分布式机器学习框架。然而，它在应对统计异质性问题时面临显著挑战。统计异质性源于参与节点间的数据分布差异，可能导致模型更新偏差、全局模型性能下降以及收敛不稳定等问题。针对上述问题，首先，详细分析统计异质性带来的主要问题，包括特征分布不一致、标签分布不均衡、数据量不对称以及数据质量参差不齐等；其次，对现有的联邦学习统计异质性解决方案进行系统综述，包括局部校正、聚类方法、客户端选择优化、聚合策略调整、数据共享、知识蒸馏以及解耦优化等，并逐一评估它们的优缺点与适用场景；最后，探讨了未来的相关研究方向，如设备计算能力感知、模型异构适应、隐私安全机制的优化以及跨任务迁移能力的提升，为应对实际应用中的统计异质性提供参考。

关键词: 联邦学习, 统计异质性, 客户端漂移, 分布式学习, 非独立同分布

CLC Number:

TP399

Hao YU, Jing FAN, Yihang SUN, Hua DONG, Enkang XI. Survey of statistical heterogeneity in federated learning[J]. Journal of Computer Applications, 2025, 45(9): 2737-2746.

俞浩, 范菁, 孙伊航, 董华, 郗恩康. 联邦学习统计异质性综述[J]. 《计算机应用》唯一官方网站, 2025, 45(9): 2737-2746.

Figures/Tables 8

Fig. 1 Process of federated learning

Fig. 2 Data distribution with statistical heterogeneity

Fig. 3 Influence of data distribution on accuracy

Fig. 4 Client drift phenomenon

Fig. 5 Types of existing solutions

Fig. 6 Basic principle of clustering method

Fig. 7 Global aggregation approach introducing shared data

Fig. 8 Client passing in cyclic topology structure

References 60

[1]	NAAS S A， MOHAMMED T， SIGG S. A global brain fuelled by local intelligence： optimizing mobile services and networks with AI［C］// Proceedings of the 16th International Conference on Mobility， Sensing and Networking. Piscataway： IEEE， 2020： 23-32.
[2]	梁天恺，曾碧，陈光. 联邦学习综述：概念、技术、应用与挑战［J］. 计算机应用， 2022， 42（12）： 3651-3662.
	LIANG T K， ZENG B， CHEN G. Federated learning survey： concepts， technologies， applications and challenges ［J］. Journal of Computer Applications， 2022， 42（12）： 3651-3662.
[3]	DENG Y， YAN X. Federated learning on heterogeneous opportunistic networks ［C］// Proceedings of the 5th International Seminar on Artificial Intelligence， Networking and Information Technology. Piscataway： IEEE， 2024： 447-451.
[4]	XU C， QU Y， XIANG Y， et al. Asynchronous federated learning on heterogeneous devices： a survey ［J］. Computer Science Review， 2023， 50： No.100595.
[5]	LI D， WANG J. FedMD： heterogeneous federated learning via model distillation ［EB/OL］. ［2024-09-09］..
[6]	张瑞麟，杜晋华，尹浩. 跨设备联邦学习中的客户端选择算法［J］.软件学报， 2024， 35（12）： 5725-5740.
	ZHANG R L， DU J H， YIN H. Client selection algorithm in cross-device federated learning ［J］. Journal of Software， 2024， 35（12）： 5725-5740.
[7]	MORA A， BUJARI A， BELLAVISTA P. Enhancing generalization in federated learning with heterogeneous data： a comparative literature review ［J］. Future Generation Computer Systems， 2024， 157： 1-15.
[8]	ALSHARIF M H， KANNADASAN R， WEI W， et al. A contemporary survey of recent advances in federated learning： taxonomies， applications， and challenges ［J］. Internet of Things， 2024， 27： No.101251.
[9]	GAO D， YAO X， YANG Q. A survey on heterogeneous federated learning ［EB/OL］. ［2024-05-11］. .
[10]	KONEČNÝ J， McMAHAN H B， RAMAGE D， et al. Federated optimization： distributed intelligence ［EB/OL］. ［2024-09-02］..
[11]	McMAHAN H B， MOORE E， RAMAGE D， et al. Communication-efficient learning of deep networks from decentralized data ［C］// Proceedings of the 20th International Conference on Artificial Intelligence and Statistics. New York： JMLR.org， 2017： 1273-1282.
[12]	LUO M， CHEN F， HU D， et al. No fear of heterogeneity： classifier calibration for federated learning with non-IID data ［C］// Proceedings of the 35th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2021： 5972-5984.
[13]	YANG C， XU M， WANG Q， et al. FLASH： heterogeneity-aware federated learning at scale ［J］. IEEE Transactions on Mobile Computing， 2024， 23（1）： 483-500.
[14]	LI T， SAHU A K， ZAHEER M， et al. Federated optimization in heterogeneous networks ［EB/OL］. ［2024-05-19］..
[15]	KARIMIREDDY S P， KALE S， MOHRI M， et al. SCAFFOLD： stochastic controlled averaging for on-device federated learning［C］// Proceedings of the 37th International Conference on Machine Learning. New York： JMLR.org， 2020： 5132-5143.
[16]	HSU T H， QI H， BROWN M. Measuring the effects of non-identical data distribution for federated visual classification ［EB/OL］. ［2024-10-13］..
[17]	LI Q， HE B， SONG D. Model-contrastive federated learning ［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 10708-10717.
[18]	ACAR D A E， ZHAO Y， NAVARRO R M， et al. Federated learning based on dynamic regularization ［EB/OL］. ［2024-11-08］. .
[19]	ZHU Z， HONG J， ZHOU J. Data-free knowledge distillation for heterogeneous federated learning ［C］// Proceedings of the 38th International Conference on Machine Learning. New York： JMLR.org， 2021： 12878-12889.
[20]	SATTLER F， MÜLLER K R， SAMEK W. Clustered federated learning： model-agnostic distributed multitask optimization under privacy constraints ［J］. IEEE Transactions on Neural Networks and Learning Systems， 2021， 32（8）： 3710-3722.
[21]	DUAN M， LIU D， JI X， et al. Flexible clustered federated learning for client-level data distribution shift ［J］. IEEE Transactions on Parallel and Distributed Systems， 2022， 33（11）： 2661-2674.
[22]	LI Z， GUAN Z， YUAN S， et al. ROCFL： a robust clustered federated learning framework towards heterogeneous data ［C］// Proceedings of the 2023 International Conference on Intelligent Communication and Networking. Piscataway： IEEE， 2023： 259-264.
[23]	CALIŃSKI T， HARABASZ J. A dendrite method for cluster analysis ［J］. Communications in Statistics-theory and Methods， 1974， 3（1）： 1-27.
[24]	JIN B， HUANG D， CHEN N， et al. Federated learning with class-imbalanced heterogeneous ［C］// Proceedings of the IEEE 14th International Symposium on Parallel Architectures， Algorithms and Programming. Piscataway： IEEE， 2023： 1-6.
[25]	GOETZ J， MALIK K， BUI D， et al. Active federated learning ［EB/OL］. ［2024-10-26］. .
[26]	CHO Y J， WANG J， JOSHI G. Client selection in federated learning： convergence analysis and power-of-choice selection strategies ［EB/OL］. ［2024-10-03］..
[27]	TANG M， NING X， WANG Y， et al. FedCor： correlation-based active client selection strategy for heterogeneous federated learning［C］// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2022： 10092-10101.
[28]	NAAS S A， SIGG S. Fast converging federated learning with non-IID data ［C］// Proceedings of the IEEE 97th Vehicular Technology Conference. Piscataway： IEEE， 2023： 1-6.
[29]	PENE P， LIAO W， YU W. Incentive design for heterogeneous client selection： a robust federated learning approach ［J］. IEEE Internet of Things Journal， 2024， 11（4）： 5939-5950.
[30]	WANG J， LIU Q， LIANG H， et al. Tackling the objective inconsistency problem in heterogeneous federated optimization［C］// Proceedings of the 34th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2020： 7611-7623.
[31]	WANG Q， LI Q， GUO B， et al. Efficient federated learning with smooth aggregation for non-IID data from multiple edges ［C］// Proceedings of the 2024 IEEE International Conference on Acoustics， Speech and Signal Processing. Piscataway： IEEE， 2024： 9006-9010.
[32]	LI X C， ZHAN D C. FedRS： federated learning with restricted softmax for label distribution non-IID data ［C］// Proceedings of the 27th ACM SGKDD Conference on Knowledge Discovery and Data Mining. New York： ACM， 2021： 995-1005.
[33]	LIU C， ALGHAZZAWI D M， CHENG L， et al. Disentangling client contributions： improving federated learning accuracy in the presence of heterogeneous data ［C］// Proceedings of the 2023 IEEE International Conference on Parallel and Distributed Processing with Applications， Big Data and Cloud Computing， Sustainable Computing and Communications， Social Computing and Networking. Piscataway： IEEE， 2023： 381-387.
[34]	LI J， LIU X， MAHMOODI T. Federated learning in heterogeneous wireless networks with adaptive mixing aggregation and computation reduction ［J］. IEEE Open Journal of the Communications Society， 2024， 5： 2164-2182.
[35]	ZHANG J， HUA Y， WANG H， et al. FedALA： adaptive local aggregation for personalized federated learning ［C］// Proceedings of the 37th AAAI Conference on Artificial Intelligence. Palo Alto： AAAI Press， 2023， 37（9）： 11237-11244.
[36]	BHATTI D M S， NAM H. A robust aggregation approach for heterogeneous federated learning ［C］// Proceedings of the 14th International Conference on Ubiquitous and Future Networks. Piscataway： IEEE， 2023： 300-304.
[37]	CHEN Y， SUN X， JIN Y. Communication-efficient federated deep learning with layerwise asynchronous model update and temporally weighted aggregation ［J］. IEEE Transactions on Neural Networks and Learning Systems， 2020， 31（10）： 4229-4238.
[38]	张红艳，张玉，曹灿明. 一种解决数据异构问题的联邦学习方法［J］. 计算机应用研究， 2024， 41（3）： 713-720.
	ZHANG H Y， ZHANG Y， CAO C M. Effective method to solve problem of data heterogeneity in federated learning ［J］. Application Research of Computers， 2024， 41（3）： 713-720.
[39]	刘吉强，王雪微，梁梦晴，等. 基于共享数据集和梯度补偿的分层联邦学习框架［J］. 信息网络安全， 2023， 23（12）： 10-20.
	LIU J Q， WANG X W， LIANG M Q， et al. A hierarchical federated learning framework based on shared datasets and gradient compensation ［J］. Netinfo Security， 2023， 23（12）： 10-20.
[40]	LIU L， ZHANG J， SONG S H， et al. Client-edge-cloud hierarchical federated learning ［C］// Proceedings of the 2020 IEEE International Conference on Communications. Piscataway： IEEE， 2020： 1-6.
[41]	JIANG D， SHAN C， ZHANG Z. Federated learning algorithm based on knowledge distillation ［C］// Proceedings of the 2020 International Conference on Artificial Intelligence and Computer Engineering. Piscataway： IEEE， 2020： 163-167.
[42]	CHAN Y H， NGAI E C H. FedHe： heterogeneous models and communication-efficient federated learning ［C］// Proceedings of the 17th International Conference on Mobility， Sensing and Networking. Piscataway： IEEE， 2021： 207-214.
[43]	AHMAD S， ARAL A. FedCD： personalized federated learning via collaborative distillation ［C］// Proceedings of the IEEE/ACM 15th International Conference on Utility and Cloud Computing. Piscataway： IEEE， 2022： 189-194.
[44]	LE H Q， NGUYEN L X， PARK S B， et al. Layer-wise knowledge distillation for cross-device federated learning ［C］// Proceedings of the 2023 International Conference on Information Networking. Piscataway： IEEE， 2023： 526-529.
[45]	SUN C， JIANG T， ZONOUZ S， et al. Fed2KD： heterogeneous federated learning for pandemic risk assessment via two-way knowledge distillation ［C］// Proceedings of the 17th Wireless On-Demand Network Systems and Services Conference. Piscataway： IEEE， 2022： 1-8.
[46]	GAO L， FU H， LI L， et al. FedDC： federated learning with non-IID data via local drift decoupling and correction ［C］// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2022： 10102-10111.
[47]	ZHENG S， YE T， LI X， et al. Federated learning via consensus mechanism on heterogeneous data： a new perspective on convergence ［C］// Proceedings of the 2024 IEEE International Conference on Acoustics， Speech and Signal Processing. Piscataway： IEEE， 2024： 7595-7599.
[48]	CHANG K， BALACHANDAR N， LAM C， et al. Distributed deep learning networks among institutions for medical imaging ［J］. Journal of the American Medical Informatics Association， 2018， 25（8）： 945-954.
[49]	FEI L， LOO C K， SHIUNG L W， et al. FedLoop： a P2P personalized federated learning method on heterogeneous data［C］// Proceedings of the 2023 IEEE Symposium Series on Computational Intelligence. Piscataway： IEEE， 2023： 1603-1606.
[50]	HU F， ZHOU W， LIAO K， et al. FedLoop： heterogeneity mitigation in federated learning ［C］// Proceedings of the 42nd Chinese Control Conference. Piscataway： IEEE， 2023： 6159-6164.
[51]	SHENG T， SHEN C， LIU Y， et al. Modeling global distribution for federated learning with label distribution skew ［J］. Pattern Recognition， 2023， 143： No.109724.
[52]	TAN Y， CHEN C， ZHUANG W， et al. Is heterogeneity notorious？ taming heterogeneity to handle test-time shift in federated learning ［C］// Proceedings of the 37th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2023： 27167-27180.
[53]	REGUIEG H， HANJRI M E， KAMILI M E， et al. A comparative evaluation of FedAvg and Per-FedAvg algorithms for Dirichlet distributed heterogeneous data ［C］// Proceedings of the 10th International Conference on Wireless Networks and Mobile Communications. Piscataway： IEEE， 2023： 1-6.
[54]	MISHRA R， GUPTA H P， BANGA G， et al. Fed-RAC： resource- aware clustering for tackling heterogeneity of participants in federated learning ［J］. IEEE Transactions on Parallel and Distributed Systems， 2024， 35（7）： 1207-1220.
[55]	SHKURTI L， SELIMI M. BACA： bandwidth and CPU-aware adaptive federated learning for wireless environments ［C］// Proceedings of the 13th Mediterranean Conference on Embedded Computing. Piscataway： IEEE， 2024： 1-5.
[56]	LIANG Y， OUYANG C， CHEN X. Adaptive asynchronous federated learning for heterogeneous clients ［C］// Proceedings of the 18th International Conference on Computational Intelligence and Security. Piscataway： IEEE， 2022： 399-403.
[57]	TANG R， JIANG M. Enhancing federated learning： transfer learning insights ［C］// Proceedings of the IEEE 3rd International Conference on Electrical Engineering， Big Data and Algorithms. Piscataway：IEEE，2024：1358-1362.
[58]	王腾，霍峥，黄亚鑫，等. 联邦学习中的隐私保护技术研究综述［J］. 计算机应用， 2023， 43（2）： 437-449.
	WANG T， HUO Z， HUANG Y X， et al. Review on privacy-preserving technologies in federated learning ［J］. Journal of Computer Applications， 2023， 43（2）： 437-449.
[59]	HITAJ B， ATENIESE G， PEREZ-CRUZ F. Deep models under the GAN： information leakage from collaborative deep learning［C］// Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. New York： ACM， 2017： 603-618.
[60]	LIU J， HE X， SUN R， et al. Privacy-preserving data sharing scheme with FL via MPC in financial permissioned blockchain［C］// Proceedings of the 2021 IEEE International Conference on Communications. Piscataway： IEEE， 2021： 1-6.

Survey of statistical heterogeneity in federated learning

联邦学习统计异质性综述

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 8

References 60

Related Articles 15

Recommended Articles

Metrics

[1]	Jintao SU, Lina GE, Liguang XIAO, Jing ZOU, Zhe WANG. Detection and defense scheme for backdoor attacks in federated learning [J]. Journal of Computer Applications, 2025, 45(8): 2399-2408.
[2]	Lina GE, Mingyu WANG, Lei TIAN. Review of research on efficiency of federated learning [J]. Journal of Computer Applications, 2025, 45(8): 2387-2398.
[3]	Hongyang ZHANG, Shufen ZHANG, Zheng GU. Federated learning algorithm for personalization and fairness [J]. Journal of Computer Applications, 2025, 45(7): 2123-2131.
[4]	Yiming ZHANG, Tengfei CAO. Federated learning optimization algorithm based on local drift and diversity computing power [J]. Journal of Computer Applications, 2025, 45(5): 1447-1454.
[5]	Yazhou FAN, Zhuo LI. Node collaboration mechanism for quality optimization of hierarchical federated learning models under energy consumption constraints [J]. Journal of Computer Applications, 2025, 45(5): 1589-1594.
[6]	Qingli CHEN, Yuanbo GUO, Chen FANG. Clustering federated learning algorithm for heterogeneous data [J]. Journal of Computer Applications, 2025, 45(4): 1086-1094.
[7]	Yufei XIANG, Zhengwei NI. Edge federation dynamic analysis for hierarchical federated learning based on evolutionary game [J]. Journal of Computer Applications, 2025, 45(4): 1077-1085.
[8]	Hui ZENG, Shiyu XIONG, Yongzheng DI, Hongzhou SHI. Federated parameter-efficient fine-tuning technology for large model based on pruning [J]. Journal of Computer Applications, 2025, 45(3): 715-724.
[9]	Haili LIN, Jing LI. Lazy client identification method in federated learning based on proof-of-work [J]. Journal of Computer Applications, 2025, 45(3): 856-863.
[10]	Zhiqiang REN, Xuebin CHEN. FedAud： adaptive defense mechanism based on historical model updates [J]. Journal of Computer Applications, 2025, 45(2): 490-496.
[11]	Chao XU, Shufen ZHANG, Haitian CHEN, Lulu PENG, Shuaihua ZHANG. Federated learning method based on adaptive differential privacy and client selection optimization [J]. Journal of Computer Applications, 2025, 45(2): 482-489.
[12]	Xinyan WANG, Jiacheng DU, Lihong ZHONG, Wangwang XU, Boyu LIU, Wei SHE. Vertical federated learning enterprise emission prediction model with integration of electricity data [J]. Journal of Computer Applications, 2025, 45(2): 518-525.
[13]	Haitian CHEN, Xuebin CHEN, Ruikui MA, Shuaihua ZHANG. Federated learning privacy protection scheme based on local differential privacy for remote sensing data [J]. Journal of Computer Applications, 2025, 45(2): 506-517.
[14]	Liang ZHU, Jingzhe MU, Hongqiang ZUO, Jingzhong GU, Fubao ZHU. Location privacy-preserving recommendation scheme based on federated graph neural network [J]. Journal of Computer Applications, 2025, 45(1): 136-143.
[15]	Yan YAN, Xingying QIAN, Pengbin YAN, Jie YANG. Federated learning-based statistical prediction and differential privacy protection method for location big data [J]. Journal of Computer Applications, 2025, 45(1): 127-135.