Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (11): 3345-3353.DOI: 10.11772/j.issn.1001-9081.2023111693

• Artificial intelligence • Previous Articles     Next Articles

Personalized federated learning based on similarity clustering and regularization

Jie WU(), Xuezhong QIAN, Wei SONG   

  1. School of Artificial Intelligence and Computer Science,Jiangnan University,Wuxi Jiangsu 214122,China
  • Received:2023-12-08 Revised:2024-03-06 Accepted:2024-03-14 Online:2024-03-22 Published:2024-11-10
  • Contact: Jie WU
  • About author:QIAN Xuezhong, born in 1967, M. S., associate professor. His research interests include data mining, machine learning, artificial intelligence.
    SONG Wei, born in 1981, Ph. D., professor. His research interests include data mining, machine learning, pattern recognition.
  • Supported by:
    National Natural Science Foundation of China(62076110)

基于相似度聚类和正则化的个性化联邦学习

巫婕(), 钱雪忠, 宋威   

  1. 江南大学 人工智能与计算机学院,江苏 无锡 214122
  • 通讯作者: 巫婕
  • 作者简介:钱雪忠(1967—),男,江苏无锡人,副教授,硕士,CCF会员,主要研究方向:数据挖掘、机器学习、人工智能
    宋威(1981—),男,湖北恩施人,教授,博士,主要研究方向:数据挖掘、机器学习、模式识别。
  • 基金资助:
    国家自然科学基金资助项目(62076110)

Abstract:

In Federated Learning (FL) application scenarios, the problems of data heterogeneity and the need to provide personalized models for different task requirements are often faced. However, the trade-off between personalization and global generalization exists in some existing Personalized Federated Learning (PFL) algorithms, and most of these algorithms use the weighted aggregation based on the amount of client data in traditional FL method, which causes poor model performance for clients with significant differences in data distribution and a lack of personalized aggregation strategies. In response to the above problems, a new PFL algorithm based on similarity clustering and regularization, namely pFedSCR, was proposed. The pFedSCR algorithm trains personalized models and local models in the client local update phase, in which the L2 norm regularization was introduced into the cross entropy loss function by the personalized models to dynamically adjust the degree of reference to the global model, thereby achieving personalization based on learning global knowledge; in the server aggregation phase, an aggregation weight matrix was constructed based on the similarity clustering updated by the client models, and the aggregation weights were dynamically adjusted to aggregate personalized models for different clients, so as to make the parameter aggregation strategy personalized while solving the problem of data heterogeneity at the same time. Experimental results under multiple Non-Independent Identical Distribution (Non-IID) data scenarios simulated through Dirichlet distribution on three datasets such as CIFAR-10, MNIST and Fashion-MNIST show that compared with some FL algorithms including the classic algorithm FedProx and the latest personalized algorithm FedPCL (Federated Prototype-wise Contrastive Learning), the pFedSCR algorithm has higher precision and communication efficiency in various scenarios, and can obtain 99.03% accuracy at most.

Key words: Federated Learning (FL), Non-Independent Identical Distribution (Non-IID), cosine similarity, regularization, Personalized Federated Learning (PFL), privacy security

摘要:

联邦学习(FL)应用场景中,常面临客户端数据异质性和不同任务需求需要提供个性化模型的问题,但现有的部分个性化联邦学习(PFL)算法中存在个性化与全局泛化的权衡问题,并且这些算法大多采用传统FL中根据客户端数据量加权聚合的方法,导致数据分布差异大的客户端模型性能变差,缺乏个性化聚合策略。针对上述问题,提出一种基于相似度聚类和正则化的PFL算法pFedSCR。pFedSCR算法在客户端本地更新阶段训练个性化模型和局部模型,其中:个性化模型在交叉熵损失函数中引入L2范数正则化,动态调整参考全局模型的程度,在汲取全局知识的基础上实现个性化;在服务端聚合阶段,根据客户端模型更新的相似度聚类,构建聚合权重矩阵,动态调整聚合权重,为不同客户端聚合个性化模型,让参数聚合策略具有个性化的同时解决数据异构问题。在CIFAR-10、MNIST、Fashion-MNIST 3个数据集上通过狄利克雷(Dirichlet)分布模拟了多种非独立同分布(Non-IID)数据场景,结果表明:pFedSCR算法在各种场景下的准确度和通信效率都优于经典算法FedProx和最新个性化算法FedPCL (Federated Prototype-wise Contrastive Learning)等联邦学习算法,最高可达到99.03%准确度。

关键词: 联邦学习, 非独立同分布, 余弦相似度, 正则化, 个性化联邦学习, 隐私安全

CLC Number: