横向联邦学习中差分隐私聚类算法

doi:10.11772/j.issn.1001-9081.2023010019

《计算机应用》唯一官方网站 ›› 2024, Vol. 44 ›› Issue (1): 217-222.DOI: 10.11772/j.issn.1001-9081.2023010019

所属专题：网络空间安全

横向联邦学习中差分隐私聚类算法

徐雪冉¹(), 杨庚¹^,², 黄喻先¹

^1.南京邮电大学计算机学院、软件学院、网络空间安全学院, 南京 210023
^2.江苏省大数据安全与智能处理重点实验室, 南京 210023

收稿日期:2023-01-09 修回日期:2023-04-13 接受日期:2023-04-14 发布日期:2023-06-06 出版日期:2024-01-10
通讯作者: 徐雪冉
作者简介:杨庚（1961—），男，江苏建湖人，教授，博士生导师，博士，主要研究方向：人工智能安全、数据隐私保护、云计算与安全；
黄喻先（1998—），男，江苏南通人，博士，主要研究方向：差分隐私、网络与信息安全、联邦学习。
第一联系人：徐雪冉（2000—），女，江苏徐州人，硕士研究生，主要研究方向：网络与信息安全、隐私保护；
基金资助:
国家自然科学基金资助项目(61872197)

Differential privacy clustering algorithm in horizontal federated learning

Xueran XU¹(), Geng YANG¹^,², Yuxian HUANG¹

^1.School of Computer Science，Nanjing University of Posts and Telecommunications，Nanjing Jiangsu 210023，China
^2.Jiangsu Key Laboratory of Big Data Security and Intelligent Processing，Nanjing Jiangsu 210023，China

Received:2023-01-09 Revised:2023-04-13 Accepted:2023-04-14 Online:2023-06-06 Published:2024-01-10
Contact: Xueran XU
About author:YANG Geng， born in 1961， Ph. D.， professor. His research interests include artificial intelligence security， data privacy protection， cloud computing security.
HUANG Yuxian， born in 1998， Ph. D. His research interests include differential privacy， network and information security， federated learning.
Supported by:
National Natural Science Foundation of China(61872197)

摘要/Abstract

摘要：

聚类分析能够挖掘出数据间隐藏的内在联系并对数据进行多指标划分，从而促进个性化和精细化运营。然而，数据孤岛造成的数据碎片化和孤立化严重影响了聚类分析的应用效果。为了解决数据孤岛问题的同时保护相关数据隐私，提出本地均分扰动联邦K-means算法（ELFedKmeans）。针对横向联邦学习模式，设计了一种基于网格的初始簇心选择方法和一种隐私预算分配方案。在ELFedKmeans算法中，各站点联合协商随机种子，以较小的通信代价生成相同的随机噪声，保护了本地数据的隐私。通过理论分析证明了该算法满足差分隐私保护，并将该算法与本地差分隐私K-means（LDPKmeans）算法和混合型隐私保护K-means （HPKmeans）算法在不同的数据集上进行了对比实验分析。实验结果表明，随着隐私预算不断增大，三个算法的F-measure值均逐渐升高；误差平方和（SSE）均逐渐减小。从整体上看，ELFedKmeans算法的F-measure值比LDPKmeans算法和HPKmeans算法分别高了1.794 5%~57.066 3%和21.245 2%~132.048 8%；ELFedKmeans算法的Log（SSE）值比LDPKmeans算法和HPKmeans算法分别减少了1.204 2%~12.894 6%和5.617 5%~27.575 2%。在相同的隐私预算下，ELFedKmeans算法在聚类质量和可用性指标上优于对比算法。

关键词: 横向联邦聚类, 差分隐私, 本地扰动, 可用性, K-means算法

Abstract:

Clustering analysis can uncover hidden interconnections between data and segment the data according to multiple indicators， which can facilitate personalized and refined operations. However， data fragmentation and isolation caused by data islands seriously affects the effectiveness of cluster analysis applications. To solve data island problem and protect data privacy， an Equivalent Local differential privacy Federated K-means （ELFedKmeans） algorithm was proposed. A grid-based initial cluster center selection method and a privacy budget allocation scheme were designed for the horizontal federation learning model. To generate same random noise with lower communication cost， all organizations jointly negotiated random seeds， protecting local data privacy. The ELFedKmeans algorithm was demonstrated satisfying differential privacy protection through theoretical analysis， and it was also compared with Local Differential Privacy distributed K-means （LDPKmeans） algorithm and Hybrid Privacy K-means （HPKmeans） algorithm on different datasets. Experimental results show that all three algorithms increase F-measure and decrease SSE （Sum of Squares due to Error） gradually as privacy budget increases. As a whole， the F-measure values of ELFedKmeans algorithm was 1.794 5% to 57.066 3% and 21.245 2% to 132.048 8% higher than those of LDPKmeans and HPKmeans algorithms respectively； the Log（SSE） values of ELFedKmeans algorithm were 1.204 2% to 12.894 6% and 5.617 5% to 27.575 2% less than those of LDPKmeans and HPKmeans algorithms respectively. With the same privacy budget， ELFedKmeans algorithm outperforms the comparison algorithms in terms of clustering quality and utility metric.

Key words: horizontal federated clustering, differential privacy, local disturbance, utility, K-means algorithm

中图分类号:

TP309.2

徐雪冉, 杨庚, 黄喻先. 横向联邦学习中差分隐私聚类算法[J]. 计算机应用, 2024, 44(1): 217-222.

Xueran XU, Geng YANG, Yuxian HUANG. Differential privacy clustering algorithm in horizontal federated learning[J]. Journal of Computer Applications, 2024, 44(1): 217-222.

图/表 5

图1 横向联邦K-means系统场景

Fig. 1 Scene of horizontal federated K-means system

图2 网格划分示例

Fig. 2 Example of grid division

表1 实验中使用的数据集

Tab. 1 Datasets used in experiments

数据集	样本数	属性数	簇数K
Banana	5 300	2	2
Magic	19 020	11	2
Adult	48 841	6	5
3D Road Network	434 874	4	10

图3 不同数据集上不同ε值下F-measure值的比较

Fig. 3 Comparison of F-measure value under different ε values on different datasets

图4 不同数据集下不同ε值下Log（SSE）值的比较

Fig. 4 Comparison of Log（SSE） value under different ε values on different datasets

参考文献 21

1	杨强. AI与数据隐私保护： “联邦学习”的破解之道［J］.信息安全研究， 2019， 5（11）： 961-965.
	YANG Q. AI and data privacy protection： The way to federated learning ［J］. Journal of Information Security Research， 2019， 5（11）： 961-965.
2	LI T， SAHU A K， TALWALKAR A， et al. Federated learning： Challenges， methods， and future directions ［J］. IEEE Signal Processing Magazine， 2020， 37（3）： 50-60. 10.1109/msp.2020.2975749
3	DUCHI J C， JORDAN M I， WAINWRIGHT M J. Local privacy and statistical minimax rates ［C］// Proceedings of the 2013 IEEE 54th Annual Symposium on Foundations of Computer Science. Piscataway IEEE， 2013： 429-438. 10.1109/allerton.2013.6736718
4	XU R， WUNSCH D. Survey of clustering algorithms ［J］. IEEE Transactions on Neural Networks， 2005， 16（3）： 645-678. 10.1109/tnn.2005.845141
5	XU D， TIAN Y. A comprehensive survey of clustering algorithms ［J］. Annals of Data Science， 2015， 2： 165-193. 10.1007/s40745-015-0040-1
6	DWORK C. Differential privacy： A survey of results ［C］// Proceedings of the 5th International Conference on Theory and Applications of Models of Computation. Cham： Springer， 2008： 1-19. 10.1007/978-3-540-79228-4
7	BLUM A， DWORK C， McSHERRY F， et al. Practical privacy： the SuLQ framework ［C］// Proceedings of the Twenty-Fourth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. New York： ACM， 2005： 128-138. 10.1145/1065167.1065184
8	AAMER Y， BENKAOUZ Y， OUZZIF M， et al. Initial centroid selection method for an enhanced k-means clustering algorithm ［C］// Proceedings of the 5th International Symposium on Ubiquitous Networking. Cham： Springer， 2020： 182-190. 10.1007/978-3-030-58008-7_15
9	RAHMAN Z， HOSSAIN M S， HASAN M， et al. An enhanced method of initial cluster center selection for K-means algorithm ［C］// Proceedings of the 2021 Innovations in Intelligent Systems and Applications Conference. Piscataway： IEEE， 2021： 1-6. 10.1109/asyu52992.2021.9599017
10	WAN Y， XIONG Q， QIU Z， et al. K-means clustering algorithm based on memristive chaotic system and sparrow search algorithm ［J］. Symmetry， 2022， 14（10）： 2029. 10.3390/sym14102029
11	HAN L， XIE Y， FAN D， et al. Improved differential privacy K-means clustering algorithm for privacy budget allocation ［C］// Proceedings of the 2022 International Conference on Computer Engineering and Artificial Intelligence. Piscataway： IEEE， 2022： 221-225. 10.1109/icceai55464.2022.00054
12	SU D， CAO J， LI N， et al. Differentially private k-means clustering ［C］// Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy. New York： ACM， 2016： 26-37. 10.1145/2857705.2857708
13	FAN Z， XU X. APDPk-means： A new differential privacy clustering algorithm based on arithmetic progression privacy budget allocation ［C］// HPCC/SmartCity/DSS： Proceedings of the 2019 IEEE 21st International Conference on High Performance Computing and Communications； IEEE 17th International Conference on Smart City； IEEE 5th International Conference on Data Science and Systems. Piscataway： IEEE， 2019： 1737-1742. 10.1109/hpcc/smartcity/dss.2019.00238
14	JIANG Z L， GUO N， JIN Y， et al. Efficient two-party privacy-preserving collaborative k-means clustering protocol supporting both storage and computation outsourcing ［J］. Information Sciences， 2020， 518： 168-180. 10.1016/j.ins.2019.12.051
15	XIA C， HUA J， TONG W， et al. Distributed K-Means clustering guaranteeing local differential privacy ［J］. Computers & Security， 2020， 90： 101699. 10.1016/j.cose.2019.101699
16	ZHANG E， LI H， HUANG Y， et al. Practical multi-party private collaborative k-means clustering ［J］. Neurocomputing， 2022， 467： 256-265. 10.1016/j.neucom.2021.09.050
17	DWORK C， ROTH A. The algorithmic foundations of differential privacy ［J］. Foundations and Trends in Theoretical Computer Science， 2014， 9（3/4）： 211-407. 10.1561/0400000042
18	WANG H， XU Z， XIONG L， et al. Conducting correlated Laplace mechanism for differential privacy ［C］// Proceedings of the 2017 International Conference on Cloud Computing and Security. Cham： Springer， 2017： 72-85. 10.1007/978-3-319-68542-7_7
19	DWORK C， McSHERRY F， NISSIM K， et al. Calibrating noise to sensitivity in private data analysis ［C］// Proceedings of the 2006 Theory of Cryptography Conference. Cham： Springer， 2006： 265-284. 10.1007/11681878_14
20	陈晓光.基于网格的密度峰值聚类算法研究及其应用［D］.大连：大连理工大学， 2017： 19.
	CHEN X G. Study on density peaks clustering algorithm based on grid and its application ［D］. Dalian： Dalian University of Technology， 2017： 19.
21	DUA D， GRAFF C. UCI Machine learning repository ［DB/OL］. ［2022-09-20］. .

[1]	张治政, 张啸剑, 王俊清, 冯光辉. 结合差分隐私与安全聚集的联邦空间数据发布方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2777-2784.
[2]	陈廷伟, 张嘉诚, 王俊陆. 面向联邦学习的随机验证区块链构建[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2770-2776.
[3]	彭鹏, 倪志伟, 朱旭辉, 陈千. 改进萤火虫群算法协同差分隐私的干扰轨迹发布[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 496-503.
[4]	高瑞, 陈学斌, 张祖篡. 面向部分图更新的动态社交网络隐私发布方法[J]. 《计算机应用》唯一官方网站, 2024, 44(12): 3831-3838.
[5]	陈学斌, 单丽洋, 郭如敏. 基于差分隐私的直方图发布方法综述[J]. 《计算机应用》唯一官方网站, 2024, 44(10): 3114-3121.
[6]	钟静, 林晨, 盛志伟, 张仕斌. 基于汉明距离的量子K-Means算法[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2493-2498.
[7]	黄硕, 李艳辉, 曹建秋. 本地化差分隐私下的频繁序列模式挖掘算法PrivSPM[J]. 《计算机应用》唯一官方网站, 2023, 43(7): 2057-2064.
[8]	陈少权, 蔡剑平, 孙岚. 动态梯度阈值裁剪的差分隐私生成对抗网络算法[J]. 《计算机应用》唯一官方网站, 2023, 43(7): 2065-2072.
[9]	尹春勇, 屈锐. 基于个性化差分隐私的联邦学习算法[J]. 《计算机应用》唯一官方网站, 2023, 43(4): 1160-1168.
[10]	王腾, 霍峥, 黄亚鑫, 范艺琳. 联邦学习中的隐私保护技术研究综述[J]. 《计算机应用》唯一官方网站, 2023, 43(2): 437-449.
[11]	张宇, 蔡英, 崔剑阳, 张猛, 范艳芳. 卷积神经网络中基于差分隐私的动量梯度下降算法[J]. 《计算机应用》唯一官方网站, 2023, 43(12): 3647-3653.
[12]	田蕾, 葛丽娜. 基于差分隐私的广告推荐算法[J]. 《计算机应用》唯一官方网站, 2023, 43(11): 3346-3350.
[13]	王利娥, 李小聪, 刘红翼. 融合知识图谱和差分隐私的新闻推荐方法[J]. 《计算机应用》唯一官方网站, 2022, 42(5): 1339-1346.
[14]	张国鹏, 陈学斌, 王豪石, 翟冉, 马征. 面向本地差分隐私的K-Prototypes聚类方法[J]. 《计算机应用》唯一官方网站, 2022, 42(12): 3813-3821.
[15]	赵乐, 张恩, 秦磊勇, 李功丽. 基于区块链的多方隐私保护k-means聚类方案[J]. 《计算机应用》唯一官方网站, 2022, 42(12): 3801-3812.

横向联邦学习中差分隐私聚类算法

Differential privacy clustering algorithm in horizontal federated learning

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 5

参考文献 21

相关文章 15

编辑推荐

Metrics