Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (9): 2737-2746.DOI: 10.11772/j.issn.1001-9081.2024091316

• Artificial intelligence • Previous Articles    

Survey of statistical heterogeneity in federated learning

Hao YU1,2,3, Jing FAN1,2,3(), Yihang SUN1,2,3, Hua DONG1,2,3, Enkang XI1,2,3   

  1. 1.School of Electrical and Information Technology,Yunnan Minzu University,Kunming Yunnan 650504,China
    2.Yunnan Key Laboratory of Unmanned Autonomous System (Yunnan Minzu University),Kunming Yunnan 650504,China
    3.Key Laboratory of Information and Communication Security and Disaster Recovery in Universities of Yunnan Province (Yunnan Minzu University),Kunming Yunnan 650504,China
  • Received:2024-09-18 Revised:2024-12-09 Accepted:2024-12-10 Online:2025-01-13 Published:2025-09-10
  • Contact: Jing FAN
  • About author:YU Hao, born in 2000, M. S. candidate. His research interests include federated learning, distributed optimization, edge computing.
    SUN Yihang, born in 2001, M. S. candidate. His research interests include federated learning, privacy security.
    DONG Hua, born in 2001, M. S. candidate. His research interests include federated learning, deep learning.
    XI Enkang, born in 2000, M. S. candidate. His research interests include federated learning, privacy security.
  • Supported by:
    National Natural Science Foundation of China(61540063);Youth Fund for Humanities and Social Sciences Research of Ministry of Education(20YJCZH129);Wu Zhonghai Expert Workstation Project(202305AF150045);Scientific Research Foundation of Education Department of Yunnan Province(2025Y0670);Yunnan Minzu University Master’s Research and Innovation Fund(2022SKY004)

联邦学习统计异质性综述

俞浩1,2,3, 范菁1,2,3(), 孙伊航1,2,3, 董华1,2,3, 郗恩康1,2,3   

  1. 1.云南民族大学 电气信息工程学院,昆明 650504
    2.云南省无人自主系统重点实验室(云南民族大学),昆明 650504
    3.云南省高校信息与通信安全灾备重点实验室(云南民族大学),昆明 650504
  • 通讯作者: 范菁
  • 作者简介:俞浩(2000—),男,湖北咸宁人,硕士研究生,CCF会员,主要研究方向:联邦学习、分布式优化、边缘计算
    孙伊航(2001—),男(回族),河南许昌人,硕士研究生,主要研究方向:联邦学习、隐私安全
    董华(2001—),男,山西运城人,硕士研究生,主要研究方向:联邦学习、深度学习
    郗恩康(2000—),男,山东枣庄人,硕士研究生,主要研究方向:联邦学习、隐私安全。
  • 基金资助:
    国家自然科学基金资助项目(61540063);教育部人文社会科学研究青年基金资助项目(20YJCZH129);云南省吴中海专家工作站项目(202305AF150045);云南省教育厅科学研究基金项目资助(2025Y0670);云南民族大学硕士研究生科研创新基金资助项目(2022SKY004)

Abstract:

Federated learning is a distributed machine learning framework that emphasizes privacy protection. However, it faces significant challenges in addressing statistical heterogeneity. Statistical heterogeneity is come from differences in data distribution across participating nodes, which may lead to problems such as model update biases, performance degradation of the global model, and instability in convergence. Aiming at the above problems, firstly, main issues caused by statistical heterogeneity were analyzed in detail, including inconsistent feature distributions, imbalanced label distributions, asymmetrical data sizes, and varying data quality. Secondly, a systematic review of the existing solutions of statistical heterogeneity in federated learning was provided, including local correction, clustering methods, client selection optimization, aggregation strategy adjustments, data sharing, knowledge distillation, and decoupling optimization, with an evaluation of their advantages, disadvantages, and applicable scenarios. Finally, future related research directions were discussed, such as device computing capacity awareness, model heterogeneity adaptation, optimization of privacy security mechanisms, and enhancement of cross-task transferability, thereby providing references for addressing statistical heterogeneity in practical applications.

Key words: federated learning, statistical heterogeneity, client drift, distributed learning, non-Independent and Identically Distributed (non-IID)

摘要:

联邦学习是一种强调隐私保护的分布式机器学习框架。然而,它在应对统计异质性问题时面临显著挑战。统计异质性源于参与节点间的数据分布差异,可能导致模型更新偏差、全局模型性能下降以及收敛不稳定等问题。针对上述问题,首先,详细分析统计异质性带来的主要问题,包括特征分布不一致、标签分布不均衡、数据量不对称以及数据质量参差不齐等;其次,对现有的联邦学习统计异质性解决方案进行系统综述,包括局部校正、聚类方法、客户端选择优化、聚合策略调整、数据共享、知识蒸馏以及解耦优化等,并逐一评估它们的优缺点与适用场景;最后,探讨了未来的相关研究方向,如设备计算能力感知、模型异构适应、隐私安全机制的优化以及跨任务迁移能力的提升,为应对实际应用中的统计异质性提供参考。

关键词: 联邦学习, 统计异质性, 客户端漂移, 分布式学习, 非独立同分布

CLC Number: