• •    

联邦学习中的隐私保护技术研究综述

王腾1,霍峥2,黄亚鑫3,范艺琳3   

  1. 1. 中国电科网络通信研究院
    2. 河北经贸大学
    3. 河北经贸大学信息技术学院
  • 收稿日期:2021-12-07 修回日期:2022-01-21 发布日期:2022-03-03
  • 通讯作者: 霍峥
  • 基金资助:
    强化隐私保护的用户画像联邦学习模型研究;用户网络行为画像中的隐私保护关键技术研究;“云-边-端”协同下任务智能调度方法研究及应用

Survey of privacy-preserving technologies in federated learning

  • Received:2021-12-07 Revised:2022-01-21 Online:2022-03-03

摘要: 机器学习的性能随着数据规模的增长大幅提升。然而,数据隐私泄露问题和数据孤岛问题限制了集中式机器学习的数据来源。近年来,联邦学习成为解决机器学习中数据难以共享的新思路。联邦学习架构不需要多方共享数据资源,参与方在本地数据上训练局部模型,周期性的将参数上传至服务器更新全局模型,就可以获得在大规模全局数据上建立的机器学习模型。联邦学习架构具有数据隐私保护的特质,是未来大规模数据机器学习的新方案。然而,联邦学习架构提供的隐私保护机制不足,在模型训练阶段和模型预测阶段都可能导致数据隐私泄露。目前,强化联邦学习架构中的隐私保护机制已经成为新的研究热点。本文从联邦学习中存在的隐私泄露问题出发,探讨了联邦学习中的攻击模型与敏感信息泄露途径。重点综述了联邦学习中的几类隐私保护技术:以差分隐私为基础的隐私保护技术、以同态加密为基础的隐私保护技术、以安全多方计算为基础的隐私保护技术、以区块链为基础的隐私保护技术。最后,探讨了联邦学习中隐私保护中的若干关键问题,展望未来研究方向。

关键词: 联邦学习, 隐私保护, 差分隐私, 同态加密, 安全多方计算

Abstract: The performance of machine learning algorithm improves with the increase of data scale. However, people’s personal privacy may be leaked when collecting a large scale of multi-source user data and executing machine learning algorithm on them. With the grown attention of individuals and governments to personal information, the data sources of centralized machine learning are limited with objective conditions. In recent years, federated learning technology has become a new way to solve the privacy problem in machine learning. The federated learning architecture does not need to train large scale data in a data center, it only needs participants to train local models on local data, and periodically upload parameters to the server to update the global model. Federated learning has the nature of privacy protection. It only needs to train the model on the nodes where the data are stored in a decentralized way and pass the parameters of the model to the server. The server is unable to obtain the original data, and the privacy of personal data is effectively protected. Today, data privacy and security issues have attracted much attention. Federated learning has the advantages of avoiding data leakage and central data attack. In addition, traditional machine learning model cannot directly deal with heterogeneous data. Using federated learning technology, machine learning model on global data can be established without sharing data, which not only protects data privacy, but also solves data heterogeneity problem. However, the privacy protection mechanism provided by federated learning architecture is insufficient, which is easy to cause the privacy disclosure of the training data. At present, strengthening the privacy protection mechanism in federated learning architecture has become a new research hotspot. Based on the problem of privacy disclosure in federated learning, this paper discusses the attack model and privacy information disclosure in federated learning. This paper focuses on privacy protection technologies in federated learning: privacy protection mechanism based on secure multi-party computing, privacy protection technology based on differential privacy, and privacy protection technology based on BlockChain. Finally, this paper discusses the key issues of privacy protection in enhanced federated learning, focusing on the problems of model convergence and performance, personalized privacy requirement, etc.

Key words: Federated learning, privacy preserving, differential privacy, homomorphic encryption, secure multiparty computation