联邦学习中的隐私保护技术研究综述

• •

联邦学习中的隐私保护技术研究综述

王腾¹,霍峥²,黄亚鑫³,范艺琳³

1. 中国电科网络通信研究院
2. 河北经贸大学
3. 河北经贸大学信息技术学院

收稿日期:2021-12-07 修回日期:2022-01-21 发布日期:2022-03-03
通讯作者: 霍峥
基金资助:
强化隐私保护的用户画像联邦学习模型研究;用户网络行为画像中的隐私保护关键技术研究;“云-边-端”协同下任务智能调度方法研究及应用

Survey of privacy-preserving technologies in federated learning

Received:2021-12-07 Revised:2022-01-21 Online:2022-03-03

摘要/Abstract

摘要： 机器学习的性能随着数据规模的增长大幅提升。然而，数据隐私泄露问题和数据孤岛问题限制了集中式机器学习的数据来源。近年来，联邦学习成为解决机器学习中数据难以共享的新思路。联邦学习架构不需要多方共享数据资源，参与方在本地数据上训练局部模型，周期性的将参数上传至服务器更新全局模型，就可以获得在大规模全局数据上建立的机器学习模型。联邦学习架构具有数据隐私保护的特质，是未来大规模数据机器学习的新方案。然而，联邦学习架构提供的隐私保护机制不足，在模型训练阶段和模型预测阶段都可能导致数据隐私泄露。目前，强化联邦学习架构中的隐私保护机制已经成为新的研究热点。本文从联邦学习中存在的隐私泄露问题出发，探讨了联邦学习中的攻击模型与敏感信息泄露途径。重点综述了联邦学习中的几类隐私保护技术：以差分隐私为基础的隐私保护技术、以同态加密为基础的隐私保护技术、以安全多方计算为基础的隐私保护技术、以区块链为基础的隐私保护技术。最后，探讨了联邦学习中隐私保护中的若干关键问题，展望未来研究方向。

关键词: 联邦学习, 隐私保护, 差分隐私, 同态加密, 安全多方计算

Abstract: The performance of machine learning algorithm improves with the increase of data scale. However, people’s personal privacy may be leaked when collecting a large scale of multi-source user data and executing machine learning algorithm on them. With the grown attention of individuals and governments to personal information, the data sources of centralized machine learning are limited with objective conditions. In recent years, federated learning technology has become a new way to solve the privacy problem in machine learning. The federated learning architecture does not need to train large scale data in a data center, it only needs participants to train local models on local data, and periodically upload parameters to the server to update the global model. Federated learning has the nature of privacy protection. It only needs to train the model on the nodes where the data are stored in a decentralized way and pass the parameters of the model to the server. The server is unable to obtain the original data, and the privacy of personal data is effectively protected. Today, data privacy and security issues have attracted much attention. Federated learning has the advantages of avoiding data leakage and central data attack. In addition, traditional machine learning model cannot directly deal with heterogeneous data. Using federated learning technology, machine learning model on global data can be established without sharing data, which not only protects data privacy, but also solves data heterogeneity problem. However, the privacy protection mechanism provided by federated learning architecture is insufficient, which is easy to cause the privacy disclosure of the training data. At present, strengthening the privacy protection mechanism in federated learning architecture has become a new research hotspot. Based on the problem of privacy disclosure in federated learning, this paper discusses the attack model and privacy information disclosure in federated learning. This paper focuses on privacy protection technologies in federated learning: privacy protection mechanism based on secure multi-party computing, privacy protection technology based on differential privacy, and privacy protection technology based on BlockChain. Finally, this paper discusses the key issues of privacy protection in enhanced federated learning, focusing on the problems of model convergence and performance, personalized privacy requirement, etc.

Key words: Federated learning, privacy preserving, differential privacy, homomorphic encryption, secure multiparty computation

王腾霍峥黄亚鑫范艺琳. 联邦学习中的隐私保护技术研究综述[J]. 计算机应用.

[1]	高改梅, 张瑾, 刘春霞, 党伟超, 白尚旺. 基于区块链与CP-ABE策略隐藏的众包测试任务隐私保护方案[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 811-818.
[2]	马海峰, 李玉霞, 薛庆水, 杨家海, 高永福. 用于实现区块链隐私保护的属性基加密方案[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 485-489.
[3]	余孙婕, 曾辉, 熊诗雨, 史红周. 基于生成式对抗网络的联邦学习激励机制[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 344-352.
[4]	彭鹏, 倪志伟, 朱旭辉, 陈千. 改进萤火虫群算法协同差分隐私的干扰轨迹发布[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 496-503.
[5]	周辉, 陈玉玲, 王学伟, 张洋文, 何建江. 基于生成对抗网络的联邦学习深度影子防御方案[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 223-232.
[6]	崔剑阳, 蔡英, 张宇, 范艳芳. 车载自组织网络中格基签密的可认证隐私保护方案[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 233-241.
[7]	徐雪冉, 杨庚, 黄喻先. 横向联邦学习中差分隐私聚类算法[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 217-222.
[8]	张静, 田贺, 熊坤, 汤永利, 杨丽. 基于云服务器的公平多方隐私集合交集协议[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2806-2811.
[9]	黄硕, 李艳辉, 曹建秋. 本地化差分隐私下的频繁序列模式挖掘算法PrivSPM[J]. 《计算机应用》唯一官方网站, 2023, 43(7): 2057-2064.
[10]	陈少权, 蔡剑平, 孙岚. 动态梯度阈值裁剪的差分隐私生成对抗网络算法[J]. 《计算机应用》唯一官方网站, 2023, 43(7): 2065-2072.
[11]	蓝梦婕, 蔡剑平, 孙岚. 非独立同分布数据下的自正则化联邦学习优化方法[J]. 《计算机应用》唯一官方网站, 2023, 43(7): 2073-2081.
[12]	陈宛桢, 张恩, 秦磊勇, 洪双喜. 边缘计算下基于区块链的隐私保护联邦学习算法[J]. 《计算机应用》唯一官方网站, 2023, 43(7): 2209-2216.
[13]	林尚静, 马冀, 庄琲, 李月颖, 李子怡, 李铁, 田锦. 基于联邦学习的无线通信流量预测[J]. 《计算机应用》唯一官方网站, 2023, 43(6): 1900-1909.
[14]	翟冉, 陈学斌, 张国鹏, 裴浪涛, 马征. 基于不同敏感度的改进K-匿名隐私保护算法[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1497-1503.
[15]	郝劭辰, 卫孜钻, 马垚, 于丹, 陈永乐. 基于高效联邦学习算法的网络入侵检测模型[J]. 《计算机应用》唯一官方网站, 2023, 43(4): 1169-1175.