Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (12): 3813-3821.DOI: 10.11772/j.issn.1001-9081.2021101724

Special Issue: 网络空间安全

• Cyber security • Previous Articles     Next Articles

K-Prototypes clustering method for local differential privacy

Guopeng ZHANG1,2,3, Xuebin CHEN1,2,3(), Haoshi WANG1,2,3, Ran ZHAI1,2,3, Zheng MA1,2,3   

  1. 1.College of Science,North China University of Science and Technology,Tangshan Hebei 063210,China
    2.Hebei Key Laboratory of Data Science and Application (North China University of Science and Technology),Tangshan Hebei 063010,China
    3.Tangshan Key Laboratory of Data Science (North China University of Science and Technology),Tangshan Hebei 063010,China
  • Received:2021-10-08 Revised:2021-12-27 Accepted:2022-01-05 Online:2022-01-24 Published:2022-12-10
  • Contact: Xuebin CHEN
  • About author:ZHANG Guopeng, born in 1996, M. S. candidate. His research interests include data security, privacy protection.
    WANG Haoshi, born in 1996, M. S. candidate. His research interests include data security, privacy protection.
    ZHAI Ran, born in 1998, M. S. candidate. Her research interests include data security, federated learning.
    MA Zheng, born in 1997, M. S. candidate. His research interests include network security, privacy protection.
  • Supported by:
    National Natural Science Foundation of China(U20A20179)


张国鹏1,2,3, 陈学斌1,2,3(), 王豪石1,2,3, 翟冉1,2,3, 马征1,2,3   

  1. 1.华北理工大学 理学院, 河北 唐山 063210
    2.河北省数据科学与应用重点实验室(华北理工大学), 河北 唐山 063210
    3.唐山市数据科学重点实验室(华北理工大学), 河北 唐山 063210
  • 通讯作者: 陈学斌
  • 作者简介:张国鹏(1996—),男,甘肃武威人,硕士研究生,CCF会员,主要研究方向:数据安全、隐私保护
  • 基金资助:


In order to protect data privacy while ensuring data availability in clustering analysis, a privacy protection clustering scheme based on Local Differential Privacy (LDP) technique called LDPK-Prototypes (LDP K-Prototypes) was proposed. Firstly, the hybrid dataset was encoded by users. Then, a random response mechanism was used to disturb the sensitive data, and after collecting the users’ disturbed data, the original dataset was recovered by the third party to the maximum extent. After that, the K-Prototypes clustering algorithm was performed. In the clustering process, the initial clustering center was determined by the dissimilarity measure method, and the new distance calculation formula was redefined by the entropy weight method. Theoretical analysis and experimental results show that compared with the ODPC (Optimizing and Differentially Private Clustering) algorithm based on the Centralized Differential Privacy (CDP) technique, the proposed scheme has the average accuracy on Adult and Heart datasets improved by 2.95% and 12.41% respectively, effectively improving the clustering usability. Meanwhile, LDPK-Prototypes expands the difference between data, effectively avoids local optimum, and improves the stability of the clustering algorithm.

Key words: Local Differential Privacy (LDC), K-Prototypes, random response mechanism, entropy weight method, privacy protection



关键词: 本地化差分隐私, K-Prototypes, 随机响应机制, 熵权法, 隐私保护

CLC Number: