Journal of Computer Applications ›› 2021, Vol. 41 ›› Issue (2): 413-421.DOI: 10.11772/j.issn.1001-9081.2020060766

Special Issue: 网络空间安全

• Cyber security • Previous Articles     Next Articles

Verifiable k-means clustering scheme with privacy-preserving

ZHANG En1,2, LI Huimin1,2, CHANG Jian1,2   

  1. 1. College of Computer and Information Engineering, Henan Normal University, Xinxiang Henan 453007, China;
    2. Engineering Lab of Intelligence Business and Internet of Things of Henan Province(Henan Normal University), Xinxiang Henan 453007, China
  • Received:2020-06-08 Revised:2020-08-13 Online:2021-02-10 Published:2020-12-17
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (U1604156, 61772176, 61602158), the Science and Technology Research Program of Henan Province (172102210045, 192102210131).


张恩1,2, 李会敏1,2, 常键1,2   

  1. 1. 河南师范大学 计算机与信息工程学院, 河南 新乡 453007;
    2. 智慧商务与物联网技术河南省工程实验室(河南师范大学), 河南 新乡 453007
  • 通讯作者: 张恩
  • 作者简介:张恩(1974-),男,河南新乡人,副教授,博士,CCF会员,主要研究方向:信息安全、密码学;李会敏(1996-),女,河南濮阳人,硕士研究生,主要研究方向:信息安全、密码学;常键(1996-),女,河南鹤壁人,硕士研究生,主要研究方向:信息安全、密码学。
  • 基金资助:

Abstract: The existing cloud outsourcing privacy-preserving k-means clustering schemes have the problem of low efficiency and the problem of returning unreasonable clustering results when the cloud server is untrusted or attacked by hackers. Therefore, a cloud outsourcing verifiable privacy-preserving k-means clustering scheme that can be applied to multi-party privacy-preserving scenarios was proposed. Firstly, an improved clustering initialization method suitable for cloud outsourcing scenarios was proposed to effectively improve the iterative efficiency of the algorithm. Secondly, the multiplicative triple technology was used to design the safe Euclidean distance algorithm, and the garbled circuit technology was used to design the algorithm for safe calculation of the minimum value. Finally, a verification algorithm was proposed, making the users only need one round of communication to verify the clustering results. And after the data outsourcing, the algorithm training was performed on the cloud entirely, which was able to effectively reduce the interactions between users and the cloud. Simulation results show that the accuracy of the proposed scheme is 97% and 93% on the datasets Synthetic and S1 respectively, indicating that the privacy-preserving k-means clustering is similar to the plaintext k-means clustering, and is suitable for medical, social sciences and business fields.

Key words: k-means clustering, cloud outsourcing, secure multi-party computation, privacy-preserving, verifiability

摘要: 针对现有云外包隐私保护k-means聚类方案存在的效率不高,以及当云服务器不可信或遭受黑客攻击时返回不合理聚类结果的问题,提出了一种可应用于多方隐私保护场景的云外包可验证隐私保护k-means聚类方案。首先,提出了一种适用于云外包场景的改进的聚类初始化方法,从而有效提高算法的迭代效率;然后,利用乘法三元组技术来设计安全欧几里得距离的计算,并利用混淆电路技术来设计安全计算最小值算法;最后,提出了一种验证算法,使用户仅需一轮通信就实现对聚类结果的验证,并且数据外包后算法的训练完全在云上进行,能够有效减少用户和云的交互。仿真实验表明,所提方案在数据集Synthetic和S1上的准确度分别达到97%和93%,说明隐私保护下的k-means聚类和明文k-means聚类的情况近似,适用于医疗、社会科学和商业等领域。

关键词: k-means聚类, 云外包, 安全多方计算, 隐私保护, 可验证性

CLC Number: