Journal of Computer Applications

    Next Articles

Survey of statistical heterogeneity in federated learning

YU Hao1,2,3, FAN Jing1,2,3, SUN Yihang1,2,3, DONG Hua1,2,3, XI Enkang1,2,3   

  1. 1.College of Electrical and Information Technology, Yunnan Minzu University 2.Yunnan Key Laboratory of Unmanned Autonomous System (Yunnan Minzu University) 3.Key Laboratory of Information and Communication Security and Disaster Recovery in Universities of Yunnan Province (Yunnan Minzu University)
  • Received:2024-09-14 Revised:2024-12-09 Online:2025-01-13 Published:2025-01-13
  • About author:YU Hao, born in 2000, M. S. candidate. His research interests include federated learning, distributed optimization, edge computing. FAN Jing, born in 1976, Ph. D., professor. Her research interests include machine learning, pattern recognition, internet of things. SUN Yihang, born in 2000, M. S. candidate. His research interests include federated learning, privacy security. DONG Hua, born in 2001, M. S. candidate. His research interests include federated learning, deep learning. XI Enkang, born in 2000, M. S. candidate. His research interests include federated learning, privacy security.
  • Supported by:
    National Natural Science Foundation of China (61540063); MOE (Ministry of Education in China) Project of Humanities and Social Sciences (20YJCZH129); Wu Zhonghai Expert Workstation (202305AF150045); Scientific Research Fundation of the Education Department of Yunnan Province, China (2023Y0499); Yunnan Minzu University Master’s Research and Innovation Fund Project (2022SKY004).

联邦学习统计异质性综述

俞浩1,2,3,范菁1,2,3,孙伊航1,2,3,董华1,2,3,郗恩康1,2,3   

  1. 1.云南民族大学 电气信息工程学院 2.云南省无人自主系统重点实验室(云南民族大学) 3.云南省高校信息与通信安全灾备重点实验室(云南民族大学)
  • 通讯作者: 范菁
  • 作者简介:俞浩(2000—),男,湖北咸宁人,硕士研究生,CCF学生会员,主要研究方向:联邦学习、分布式优化、边缘计算;范菁(1976—),女(傣族),云南西双版纳人,教授,博士,CCF会员,主要研究方向:机器学习、人工智能、物联网;孙伊航(2001—),男(回族),河南许昌人,硕士研究生,主要研究方向:联邦学习、隐私安全;董华(2001—),男,山西运城人,硕士研究生,主要研究方向:联邦学习、深度学习;郗恩康(2000—),男,山东枣庄人,硕士研究生,主要研究方向:联邦学习、隐私安全。
  • 基金资助:
    国家自然科学基金资助项目(61540063);教育部人文社会科学研究青年基金资助项目(20YJCZH129);云南省吴中海专家工作站项目(202305AF150045);云南省教育厅科学研究基金资助项目(2023Y0499);云南民族大学硕士研究生科研创新基金项目(2022SKY004)

Abstract: Federated learning is a distributed machine learning framework that emphasizes privacy protection, but faces significant challenges in addressing statistical heterogeneity. Statistical heterogeneity arises from the differences in data distribution across participating nodes, which may lead to model update biases, performance degradation of the global model, and instability in convergence. The main issues caused by statistical heterogeneity were analyzed in detail, including inconsistent feature distributions, imbalanced label distributions, asymmetrical data volumes, and varying data quality. A systematic review of existing solutions was provided, which include local correction, clustering methods, client selection optimization, aggregation strategy adjustments, data sharing, knowledge distillation, and decoupling optimization, with an evaluation of their advantages, disadvantages, and applicable scenarios. Furthermore, future research directions were discussed, such as device computing capacity awareness, model heterogeneity adaptation, privacy security mechanisms optimization, and enhancement of cross-task transferability, providing references for addressing statistical heterogeneity in practical applications.

Key words: federated learning, statistical heterogeneity, client drift, distributed learning, non-IID (non-Independent and Identically Distributed)

摘要: 联邦学习是一种强调隐私保护的分布式机器学习框架,但在应对统计异质性问题时面临显著挑战。统计异质性源于参与节点间的数据分布差异,可能导致模型更新偏差、全局模型性能下降以及收敛不稳定。针对上述问题,详细分析了统计异质性带来的主要问题,包括特征分布不一致、标签分布不均衡、数据量不对称及数据质量参差不齐等,并对现有的解决方案进行了系统综述。这些方案包括局部校正、聚类方法、客户端选择优化、聚合策略调整、数据共享、知识蒸馏及解耦优化等,并逐一评估它们的优缺点与适用场景。此外,探讨了未来研究方向,如设备计算能力感知、模型异构适应、隐私安全机制的优化及跨任务迁移能力的提升,为应对实际应用中的统计异质性提供了参考。

关键词: 联邦学习, 统计异质性, 客户端漂移, 分布式学习, 非独立同分布

CLC Number: