基于区块链的多方隐私保护k-means聚类方案

doi:10.11772/j.issn.1001-9081.2021091640

《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (12): 3801-3812.DOI: 10.11772/j.issn.1001-9081.2021091640

所属专题：网络空间安全

基于区块链的多方隐私保护k-means聚类方案

赵乐¹^,², 张恩¹^,²(), 秦磊勇¹^,², 李功丽¹^,²

^1.河南师范大学计算机与信息工程学院，河南新乡 453007
^2.智慧商务与物联网技术河南省工程实验室（河南师范大学），河南新乡 453007

收稿日期:2021-09-17 修回日期:2022-01-10 接受日期:2022-01-19 发布日期:2022-04-15 出版日期:2022-12-10
通讯作者: 张恩
作者简介:赵乐（1997—），女，河南驻马店人，硕士研究生，主要研究方向：信息安全、密码学
秦磊勇（1997—），男，河南商丘人，硕士研究生，主要研究方向：信息安全、密码学
李功丽（1981—），女，河南信阳人，副教授，博士，CCF会员，主要研究方向：信息安全、密码学。
基金资助:
国家自然科学基金资助项目(U1604156);河南省科技攻关计划项目(192102210131);河南省软科学研究计划项目(212400410109)

Multi-party privacy preserving k-means clustering scheme based on blockchain

Le ZHAO¹^,², En ZHANG¹^,²(), Leiyong QIN¹^,², Gongli LI¹^,²

^1.College of Computer and Information Engineering，Henan Normal University，Xinxiang Henan 453007，China
^2.Engineering Lab of Intelligence Business and Internet of Things of Henan Province （Henan Normal University），Xinxiang Henan 453007，China

Received:2021-09-17 Revised:2022-01-10 Accepted:2022-01-19 Online:2022-04-15 Published:2022-12-10
Contact: En ZHANG
About author:ZHAO Le， born in 1997， M. S. candidate. Her research interests include information security， cryptography.
QIN Leiyong， born in 1997， M. S. candidate. His research interests include information security， cryptography.
LI Gongli， born in 1981， Ph. D.， associate professor. Her research interests include information security， cryptography.
Supported by:
National Natural Science Foundation of China(U1604156);Science and Technology Research Program of Henan Province(192102210131);Soft Science Research Project of Henan Province(212400410109)

摘要/Abstract

摘要：

针对现有隐私保护k-means聚类方案迭代效率不高，中心化差分隐私保护k-means聚类方案中服务器会遭受攻击，以及本地化差分隐私保护k-means聚类方案中服务器会返回错误聚类结果的问题，提出了一种基于区块链的多方隐私保护k-means聚类方案（M-PPkCS/B）。利用本地化差分隐私技术的优势及区块链公开透明、不可篡改的特性，首先，设计一种多方k-means聚类中心初始化算法（M-kCCIA），在保护用户隐私的同时，提高聚类的迭代效率，并确保用户联合产生初始聚类中心的正确性；然后，设计一种基于区块链的隐私保护k-means聚类算法（Bc-PpkCA），并构建聚类中心更新算法的智能合约来在区块链上迭代更新聚类中心，从而保证各个用户都能得到正确的聚类结果。在数据集HTRU2和Abalone上进行实验的结果表明，在确保各个用户得到正确聚类结果的同时，两个数据集的准确率分别能达到97.53%和96.19%，M-kCCIA的平均迭代次数与随机化初始聚类中心算法RS的平均迭代次数相比，在两个数据集上分别减少了5.68次和2.75次。

关键词: k-means聚类, 隐私保护, 本地化差分隐私, 区块链, 智能合约

Abstract:

In order to solve the problems that the iterative efficiencies of the existing privacy protection k-means clustering schemes are low， the server in the centralized differential privacy preserving k-means clustering scheme may be attacked， and the server in the localized differential privacy protection k-means clustering scheme may return wrong clustering results， a Multi-party Privacy Protection k-means Clustering Scheme based on Blockchain （M-PPkCS/B） was proposed. Taking advantages of localized differential privacy technology and the characteristics of the blockchain such as being open， transparent， and non-tamperable， firstly， a Multi-party k-means Clustering Center Initialization Algorithm （M-kCCIA） was designed to improve the iterative efficiency of clustering while protecting user privacy， and ensure the correctness of initial clustering centers jointly generated by the users. Then， a Blockchain-based Privacy Protection k-means Clustering Algorithm （Bc-PPkCA） was designed， and a smart contract of clustering center updating algorithm was constructed. The clustering center was updated iteratively by the above smart contract on the blockchain to ensure that each user was able to obtain the correct clustering results. Through experiments on the datasets HTRU2 and Abalone， the results show that while ensuring that each user obtains the correct clustering results， the accuracy can reach 97.53% and 96.19% respectively， the average iteration times of M-kCCIA is 5.68 times and 2.75 times less than that of the algorithm of randomly generating initial cluster center called Random Selection （RS）.

Key words: k-means clustering, privacy preserving, local differential privacy, blockchain, smart contract

中图分类号:

TP309.2

赵乐, 张恩, 秦磊勇, 李功丽. 基于区块链的多方隐私保护k-means聚类方案[J]. 计算机应用, 2022, 42(12): 3801-3812.

Le ZHAO, En ZHANG, Leiyong QIN, Gongli LI. Multi-party privacy preserving k-means clustering scheme based on blockchain[J]. Journal of Computer Applications, 2022, 42(12): 3801-3812.

图/表 13

图1 区块链的架构

Fig. 1 Architecture of blockchain

表1 符号及其含义

Tab. 1 Symbols and their meanings

符号	含义
$D O 1, D O 2, ⋯, D O m$	m个用户
$n i$	第i个用户的数据个数
$d i, 1, d i, 2, ⋯, d i, n i$	第i个用户的数据
$a - b$	a与b的欧几里得平方距离
$c i, q$	第i个用户的第q个聚类中心
$c ˜ q$	经过扰动的第q个聚类中心
$l, l 1, l 2$	初始聚类中心、属性值和、个数和的二进制串长度
$c ˜ q, 0, c ˜ q, 1, ⋯, c ˜ q, l - 1$	第q个聚类中心扰动后的二进制串
$ε 1$	用户初始聚类中心的隐私预算
$ε b, ε a$	用户更新聚类中心的隐私预算
$f 1, p, f b, p, f a, p$	$ε 1 、 ε b 、 ε a$ 隐私预算分别对应的第p位响应概率
$S u m i, q, N u m i, q$	第i个用户的第q个聚类的属性值和与个数和
$B ˜ S u m i, q, A ˜ N u m i, q$	第i个用户的第q个聚类的属性值和与个数和扰动之后的二进制串

表1 符号及其含义

Tab. 1 Symbols and their meanings

符号	含义
$D O 1, D O 2, ⋯, D O m$	m个用户
$n i$	第i个用户的数据个数
$d i, 1, d i, 2, ⋯, d i, n i$	第i个用户的数据
$a - b$	a与b的欧几里得平方距离
$c i, q$	第i个用户的第q个聚类中心
$c ˜ q$	经过扰动的第q个聚类中心
$l, l 1, l 2$	初始聚类中心、属性值和、个数和的二进制串长度
$c ˜ q, 0, c ˜ q, 1, ⋯, c ˜ q, l - 1$	第q个聚类中心扰动后的二进制串
$ε 1$	用户初始聚类中心的隐私预算
$ε b, ε a$	用户更新聚类中心的隐私预算
$f 1, p, f b, p, f a, p$	$ε 1 、 ε b 、 ε a$ 隐私预算分别对应的第p位响应概率
$S u m i, q, N u m i, q$	第i个用户的第q个聚类的属性值和与个数和
$B ˜ S u m i, q, A ˜ N u m i, q$	第i个用户的第q个聚类的属性值和与个数和扰动之后的二进制串

图2 M-kCCIA整体结构

Fig. 2 Overall structure of M-kCCIA

图3 Bc-PPkCA整体结构

Fig. 3 Overall structure of Bc-PPkCA

表2 初始化算法的功能比较

Tab. 2 Function comparison of initialization algorithms

算法	参与方	存在初始聚类中心篡改	保护聚类中心隐私	用户获得正确聚类中心
文献［6］算法	两方	是	是	否
文献［25］算法	多方	是	是	否
M-kCCIA	多方	否	是	是

表3 隐私保护k-means算法的功能比较

Tab. 3 Function comparison of privacy protection k-means algorithms

算法	参与方	保护用户隐私	保护迭代过程数据	保护聚类结果	存在数据篡改	采用区块链
文献［4］算法	两方	是	否	否	是	否
文献［5］算法	多方	是	是	是	是	否
文献［8］算法	多方	是	是	是	是	否
文献［24］算法	多方	是	是	是	是	否
文献［26］算法	多方	是	是	是	是	否
Bc-PPkCA	多方	是	是	是	否	是

表4 数据集介绍

Tab.4 Introduction of datasets

数据集	维度	样本数
HTRU2	9	17 898
Abalone	8	4 177

图4 初始化算法的迭代效率比较结果

Fig. 4 Comparison results of iterative efficiency of initialization algorithms

图5 聚类中心比较结果

Fig. 5 Comparison results of cluster centers

图6 HTRU2的明文聚类结果

Fig. 6 Plaintext clustering results of HTRU2

图7 HTRU2的M-PPkCS/B聚类结果

Fig. 7 Clustering results of M-PPkCS/B of HTRU2

图8 Abalone的明文聚类结果

Fig. 8 Plaintext clustering results of Abalone

图9 Abalone的M-PPkCS/B聚类结果

Fig. 9 Clustering results of M-PPkCS/B of Abalone

参考文献 32

1	ZHOU L N， PAN S M， WANG J W， et al. Machine learning on big data： opportunities and challenges［J］. Neurocomputing， 2017， 237： 350-361. 10.1016/j.neucom.2017.01.026
2	QIU J F， WU Q H， DING G R， et al. A survey of machine learning for big data processing［J］. EURASIP Journal on Advances in Signal Processing， 2016， 2016： No.67. 10.1186/s13634-016-0355-x
3	JHA S， KRUGER L， McDANIEL P. Privacy preserving clustering［C］// Proceedings of the 2005 European Symposium on Research in Computer Security， LNCS 3679. Berlin： Springer， 2005： 397-417.
4	JIANG Z L， GUO N， JIN Y B， et al. Efficient two-party privacy-preserving collaborative k-means clustering protocol supporting both storage and computation outsourcing［J］. Information Sciences， 2020， 518： 168-180. 10.1016/j.ins.2019.12.051
5	WU W， LIU J， WANG H M， et al. Secure and efficient outsourced k-means clustering using fully homomorphic encryption with ciphertext packing technique［J］. IEEE Transactions on Knowledge and Data Engineering， 2021， 33（10）： 3424-3437. 10.1109/tkde.2020.2969633
6	MOHASSEL P， ROSULEK M， TRIEU T， et al. Practical privacy-preserving K-means clustering［J］. Proceedings on Privacy Enhancing Technologies， 2020， 2020（4）： 414-433. 10.2478/popets-2020-0080
7	LIU Y， MA Z， YAN Z， et al. Privacy-preserving federated k-means for proactive caching in next generation cellular networks［J］. Information Sciences， 2020， 521： 14-31. 10.1016/j.ins.2020.02.042
8	GUO X C， LIN H， WU Y L， et al. A new data clustering strategy for enhancing mutual privacy in healthcare IoT systems［J］. Future Generation Computer Systems， 2020， 113： 407-417. 10.1016/j.future.2020.07.023
9	张恩，蔡永泉. 理性的安全两方计算协议［J］. 计算机研究与发展， 2013， 50（7）： 1409-1417.
	ZHANG E， CAI Y Q. Rational secure two-party computation protocol［J］. Journal of Computer Research and Development， 2013， 50（7）： 1409-1417.
10	程柏良，曾国荪，揭安全. 基于安全多方计算的可信防共谋协议模型［J］. 通信学报， 2011， 32（8）： 23-30. 10.3969/j.issn.1000-436X.2011.08.004
	CHENG B L， ZENG G S， JIE A Q. Trusted coalition-proof protocol model based on secure multi-part computing［J］. Journal on Communications， 2011， 32（8）： 23-30. 10.3969/j.issn.1000-436X.2011.08.004
11	DINUR I， NISSIM K. Revealing information while preserving privacy［C］// Proceedings of the 22nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. New York： ACM， 2003： 202-210. 10.1145/773153.773173
12	DWORK C， NISSIM K. Privacy-preserving datamining on vertically partitioned databases［C］// Proceedings of the 2004 Annual International Cryptology Conference. LNCS 3152. Berlin： Springer， 2004： 528-544.
13	DWORK C， McSHERRY F， NISSIM K， et al. Calibrating noise to sensitivity in private data analysis［C］// Proceedings of the 2006 Theory of Cryptography Conference， LNCS 3876. Berlin： Springer， 2006： 265-284.
14	McSHERRY F， TALWAR K. Mechanism design via differential privacy［C］// Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer Science. Piscataway： IEEE， 2007： 94-103. 10.1109/focs.2007.66
15	DWORK C. Differential privacy： a survey of results［C］// Proceedings of the 2008 International Conference on Theory and Applications of Models of Computation， LNCS 4978. Berlin： Springer， 2008： 1-19.
16	McSHERRY F D. Privacy integrated queries： an extensible platform for privacy-preserving data analysis［C］// Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data. New York： ACM， 2009： 19-30. 10.1145/1559845.1559850
17	GUPTA A， LIGETT K， McSHERRY F， et al. Differentially private combinatorial optimization［C］// Proceedings of the 21st Annual ACM-SIAM Symposium on Discrete Algorithms. Philadelphia， PA： Society for Industrial and Applied Mathematics， 2010： 1106-1125. 10.1137/1.9781611973075.90
18	BLUM A， DWORK C， McSHERRY F， et al. Practical privacy： the SuLQ framework［C］// Proceedings of the 24th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. New York： ACM， 2005： 128-138. 10.1145/1065167.1065184
19	NISSIM K， RASKHODNIKOVA S， SMITH A. Smooth sensitivity and sampling in private data analysis［C］// Proceedings of the 39th Annual ACM Symposium on Theory of Computing. New York： ACM， 2007： 75-84. 10.1145/1250790.1250803
20	DWORK C. A firm foundation for private data analysis［J］. Communications of the ACM， 2011， 54（1）： 86-95. 10.1145/1866739.1866758
21	SU D， CAO J N， LI N H， et al. Differentially private k-means clustering and a hybrid approach to private optimization［J］. ACM Transaction on Privacy and Security， 2017， 20（4）： No.16. 10.1145/3133201
22	FELDMAN D， XIANG C Y， ZHU R H， et al. Coresets for differentially private k-means clustering and applications to privacy in mobile sensor networks［C］// Proceedings of the 16th ACM/IEEE International Conference on Information Processing in Sensor Networks. New York： ACM， 2017： 3-15. 10.1145/3055031.3055090
23	LU Z G， SHEN H. A convergent differentially private k-means clustering algorithm［C］// Proceedings of the 2019 Pacific-Asia Conference on Knowledge Discovery and Data Mining， LNCS 11439. Cham： Springer， 2019： 612-624.
24	NI T J， QIAO M H， CHEN Z L， et al. Utility-efficient differentially private K-means clustering based on cluster merging［J］. Neurocomputing， 2021， 424： 205-214. 10.1016/j.neucom.2020.10.051
25	ZHANG E， LI H M， HUANG Y C， et al. Practical multi-party private collaborative k-means clustering［J］. Neurocomputing， 2022， 467： 256-265. 10.1016/j.neucom.2021.09.050
26	XIA C， HUA J Y， TONG W， et al. Distributed K-means clustering guaranteeing local differential privacy［J］. Computers and Security， 2020， 90： No.101699. 10.1016/j.cose.2019.101699
27	MaCQUEEN J. Some methods for classification and analysis of multivariate observations［C］// Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability. Oakland，CA：University of California Press，1967：281-297.
28	熊平，朱天清，王晓峰. 差分隐私保护及其应用［J］. 计算机学报， 2014， 37（1）： 101-122. 10.3724/SP.J.1016.2014.00101
	XIONG P， ZHU T Q， WANG X F. A survey on differential privacy and applications［J］. Chinese Journal of Computers， 2014， 37（1）： 101-122. 10.3724/SP.J.1016.2014.00101
29	叶青青，孟小峰，朱敏杰，等. 本地化差分隐私研究综述［J］. 软件学报， 2018， 29（7）： 1981-2005.
	YE Q Q， MENG X F， ZHU M J， et al. Survey on local differential privacy［J］. Journal of Software， 2018， 29（7）： 1981-2005.
30	WARNER S L. Randomized response： a survey technique for eliminating evasive answer bias［J］. Journal of the American Statistical Association， 1965， 60（309）： 63-69. 10.1080/01621459.1965.10480775
31	NOFER M， GOMBER P， HINZ O， et al. Blockchain［J］. Business and Information Systems Engineering， 2017， 59（3）： 183-187. 10.1007/s12599-017-0467-3
32	CHRISTIDIS K， DEVETSIKIOTIS M. Blockchains and smart contracts for the Internet of Things［J］. IEEE Access， 2016， 4： 2292-2303. 10.1109/access.2016.2566339

[1]	陈廷伟, 张嘉诚, 王俊陆. 面向联邦学习的随机验证区块链构建[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2770-2776.
[2]	孙晓玲, 王丹辉, 李姗姗. 基于区块链的动态密文排序检索方案[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2500-2505.
[3]	黄河, 金瑜. 基于投票和以太坊智能合约的云数据审计方案[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2093-2101.
[4]	李皎, 张秀山, 宁远航. 降低跨分片交易比例的区块链分片方法[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1889-1896.
[5]	陈学斌, 任志强, 张宏扬. 联邦学习中的安全威胁与防御措施综述[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1663-1672.
[6]	刘沛骞, 王水莲, 申自浩, 王辉. 基于轨迹扰动和路网匹配的位置隐私保护算法[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1546-1554.
[7]	赵莉朋, 郭兵. 基于BDLS的区块链共识改进算法[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1139-1147.
[8]	陈美宏, 袁凌云, 夏桐. 基于主从多链的数据分类分级访问控制模型[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1148-1157.
[9]	高改梅, 张瑾, 刘春霞, 党伟超, 白尚旺. 基于区块链与CP-ABE策略隐藏的众包测试任务隐私保护方案[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 811-818.
[10]	孙林, 刘梦含. 基于自适应布谷鸟优化特征选择的K-means聚类[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 831-841.
[11]	马海峰, 李玉霞, 薛庆水, 杨家海, 高永福. 用于实现区块链隐私保护的属性基加密方案[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 485-489.
[12]	王一帆, 林绍福, 李云江. 基于区块链和零知识证明的高速公路自由流收费方法[J]. 《计算机应用》唯一官方网站, 2024, 44(12): 3741-3750.
[13]	王伊婷, 万武南, 张仕斌, 张金全, 秦智. 基于SM9算法的可链接环签名方案[J]. 《计算机应用》唯一官方网站, 2024, 44(12): 3709-3716.
[14]	梁静, 万武南, 张仕斌, 张金全, 秦智. 面向主从链的慈善系统溯源存储模型[J]. 《计算机应用》唯一官方网站, 2024, 44(12): 3751-3758.
[15]	刘德渊, 张金全, 张鑫, 万武南, 张仕斌, 秦智. 基于无证书签密的跨链身份认证方案[J]. 《计算机应用》唯一官方网站, 2024, 44(12): 3731-3740.

基于区块链的多方隐私保护k-means聚类方案

Multi-party privacy preserving k-means clustering scheme based on blockchain

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 13

参考文献 32

相关文章 15

编辑推荐

Metrics