《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (7): 2073-2081.DOI: 10.11772/j.issn.1001-9081.2022071122

• 第39届CCF中国数据库学术会议(NDBC 2022) • 上一篇    

非独立同分布数据下的自正则化联邦学习优化方法

蓝梦婕, 蔡剑平(), 孙岚   

  1. 福州大学 计算机与大数据学院,福州 350108
  • 收稿日期:2022-07-12 修回日期:2022-08-15 接受日期:2022-08-17 发布日期:2023-07-20 出版日期:2023-07-10
  • 通讯作者: 蔡剑平
  • 作者简介:蓝梦婕(1998—),女,福建三明人,硕士研究生,CCF学生会员,主要研究方向:联邦学习、差分隐私;
    蔡剑平(1990—),男,福建漳州人,博士研究生,主要研究方向:差分隐私、联邦学习、机器学习;
    孙岚(1978—),女,福建福州人,讲师,硕士,主要研究方向:数据安全、隐私保护。

Self-regularization optimization methods for Non-IID data in federated learning

Mengjie LAN, Jianping CAI(), Lan SUN   

  1. College of Computer and Data Science,Fuzhou University,Fuzhou Fujian 350108,China
  • Received:2022-07-12 Revised:2022-08-15 Accepted:2022-08-17 Online:2023-07-20 Published:2023-07-10
  • Contact: Jianping CAI
  • About author:LAN Mengjie, born in 1998, M. S. candidate. Her research interests include federated learning, differential privacy.
    CAI Jianping, born in 1990, Ph. D. candidate. His research interests include differential privacy, federated learning, machine learning.
    SUN Lan, born in 1978, M. S., lecturer. Her research interests include data security, privacy protection.

摘要:

联邦学习(FL)是一种新的分布式机器学习范式,它在保护设备数据隐私的同时打破数据壁垒,从而使各方能在不共享本地数据的前提下协作训练机器学习模型。然而,如何处理不同客户端的非独立同分布(Non-IID)数据仍是FL面临的一个巨大挑战,目前提出的一些解决方案没有利用好本地模型和全局模型的隐含关系,无法简单而高效地解决问题。针对FL中不同客户端数据的Non-IID问题,提出新的FL优化算法——联邦自正则(FedSR)和动态联邦自正则(Dyn-FedSR)。FedSR在每一轮训练过程中引入自正则化惩罚项动态修改本地损失函数,并通过构建本地模型和全局模型的关系来让本地模型靠近聚合丰富知识的全局模型,从而缓解Non-IID数据带来的客户端偏移问题;Dyn-FedSR则在FedSR基础上通过计算本地模型和全局模型的相似度来动态确定自正则项系数。对不同任务进行的大量实验分析表明,FedSR和Dyn-FedSR这两个算法在各种场景下的表现都明显优于联邦平均(FedAvg)算法、联邦近端(FedProx)优化算法和随机控制平均算法(SCAFFOLD)等FL算法,能够实现高效通信,正确率较高,且对不平衡数据和不确定的本地更新具有鲁棒性。

关键词: 联邦学习, 非独立同分布, 客户端偏移, 正则化, 分布式机器学习, 隐私保护

Abstract:

Federated Learning (FL) is a new distributed machine learning paradigm that breaks down data barriers and protects data privacy at the same time, thereby enabling clients to collaboratively train a machine learning model without sharing local data. However, how to deal with Non-Independent Identical Distribution (Non-IID) data from different clients remains a huge challenge faced by FL. Some existing proposed solutions to this problem do not utilize the implicit relationship between local and global models to solve the problem simply and efficiently. To address the Non-IID issue of different clients in FL, novel FL optimization algorithms including Federated Self-Regularization (FedSR) and Dynamic Federated Self-Regularization (Dyn-FedSR) were proposed. In FedSR, self-regularization penalty terms were introduced in each training round to modify the local loss function dynamically, and by building a relationship between the local and the global models, the local model was closer to the global model that aggregates rich knowledge, thereby alleviating the client drift problem caused by Non-IID data. In Dyn-FedSR, the self-regularization term coefficient was determined dynamically by calculating the similarity between the local and global models. Extensive experimental analyses on different tasks demonstrate that the two algorithms, FedSR and Dyn-FedSR, significantly outperform the state-of-the-art FL algorithms such as Federated Averaging (FedAvg) algorithm, Federated Proximal (FedProx) optimization algorithm and Stochastic Controlled Averaging algorithm (SCAFFOLD) in various scenarios, and can achieve efficient communication and high accuracy, as well as the robustness to imbalanced data and uncertain local updates.

Key words: Federated Learning (FL), Non-Independent Identical Distribution (Non-IID), client drift, regularization, distributed machine learning, privacy-preserving

中图分类号: