Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (12): 3801-3812.DOI: 10.11772/j.issn.1001-9081.2021091640

Special Issue: 网络空间安全

• Cyber security • Previous Articles     Next Articles

Multi-party privacy preserving k-means clustering scheme based on blockchain

Le ZHAO1,2, En ZHANG1,2(), Leiyong QIN1,2, Gongli LI1,2   

  1. 1.College of Computer and Information Engineering,Henan Normal University,Xinxiang Henan 453007,China
    2.Engineering Lab of Intelligence Business and Internet of Things of Henan Province (Henan Normal University),Xinxiang Henan 453007,China
  • Received:2021-09-17 Revised:2022-01-10 Accepted:2022-01-19 Online:2022-04-15 Published:2022-12-10
  • Contact: En ZHANG
  • About author:ZHAO Le, born in 1997, M. S. candidate. Her research interests include information security, cryptography.
    QIN Leiyong, born in 1997, M. S. candidate. His research interests include information security, cryptography.
    LI Gongli, born in 1981, Ph. D., associate professor. Her research interests include information security, cryptography.
  • Supported by:
    National Natural Science Foundation of China(U1604156);Science and Technology Research Program of Henan Province(192102210131);Soft Science Research Project of Henan Province(212400410109)

基于区块链的多方隐私保护k-means聚类方案

赵乐1,2, 张恩1,2(), 秦磊勇1,2, 李功丽1,2   

  1. 1.河南师范大学 计算机与信息工程学院,河南 新乡 453007
    2.智慧商务与物联网技术河南省工程实验室(河南师范大学),河南 新乡 453007
  • 通讯作者: 张恩
  • 作者简介:赵乐(1997—),女,河南驻马店人,硕士研究生,主要研究方向:信息安全、密码学
    秦磊勇(1997—),男,河南商丘人,硕士研究生,主要研究方向:信息安全、密码学
    李功丽(1981—),女,河南信阳人,副教授,博士,CCF会员,主要研究方向:信息安全、密码学。
  • 基金资助:
    国家自然科学基金资助项目(U1604156);河南省科技攻关计划项目(192102210131);河南省软科学研究计划项目(212400410109)

Abstract:

In order to solve the problems that the iterative efficiencies of the existing privacy protection k-means clustering schemes are low, the server in the centralized differential privacy preserving k-means clustering scheme may be attacked, and the server in the localized differential privacy protection k-means clustering scheme may return wrong clustering results, a Multi-party Privacy Protection k-means Clustering Scheme based on Blockchain (M-PPkCS/B) was proposed. Taking advantages of localized differential privacy technology and the characteristics of the blockchain such as being open, transparent, and non-tamperable, firstly, a Multi-party k-means Clustering Center Initialization Algorithm (M-kCCIA) was designed to improve the iterative efficiency of clustering while protecting user privacy, and ensure the correctness of initial clustering centers jointly generated by the users. Then, a Blockchain-based Privacy Protection k-means Clustering Algorithm (Bc-PPkCA) was designed, and a smart contract of clustering center updating algorithm was constructed. The clustering center was updated iteratively by the above smart contract on the blockchain to ensure that each user was able to obtain the correct clustering results. Through experiments on the datasets HTRU2 and Abalone, the results show that while ensuring that each user obtains the correct clustering results, the accuracy can reach 97.53% and 96.19% respectively, the average iteration times of M-kCCIA is 5.68 times and 2.75 times less than that of the algorithm of randomly generating initial cluster center called Random Selection (RS).

Key words: k-means clustering, privacy preserving, local differential privacy, blockchain, smart contract

摘要:

针对现有隐私保护k-means聚类方案迭代效率不高,中心化差分隐私保护k-means聚类方案中服务器会遭受攻击,以及本地化差分隐私保护k-means聚类方案中服务器会返回错误聚类结果的问题,提出了一种基于区块链的多方隐私保护k-means聚类方案(M-PPkCS/B)。利用本地化差分隐私技术的优势及区块链公开透明、不可篡改的特性,首先,设计一种多方k-means聚类中心初始化算法(M-kCCIA),在保护用户隐私的同时,提高聚类的迭代效率,并确保用户联合产生初始聚类中心的正确性;然后,设计一种基于区块链的隐私保护k-means聚类算法(Bc-PpkCA),并构建聚类中心更新算法的智能合约来在区块链上迭代更新聚类中心,从而保证各个用户都能得到正确的聚类结果。在数据集HTRU2和Abalone上进行实验的结果表明,在确保各个用户得到正确聚类结果的同时,两个数据集的准确率分别能达到97.53%和96.19%,M-kCCIA的平均迭代次数与随机化初始聚类中心算法RS的平均迭代次数相比,在两个数据集上分别减少了5.68次和2.75次。

关键词: k-means聚类, 隐私保护, 本地化差分隐私, 区块链, 智能合约

CLC Number: