《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (5): 1497-1503.DOI: 10.11772/j.issn.1001-9081.2022040552

• 网络空间安全 • 上一篇    下一篇

基于不同敏感度的改进K-匿名隐私保护算法

翟冉1,2,3, 陈学斌1,2,3(), 张国鹏1,2,3, 裴浪涛1,2,3, 马征1,2,3   

  1. 1.华北理工大学 理学院, 河北 唐山 063210
    2.河北省数据科学与应用重点实验室(华北理工大学), 河北 唐山 063210
    3.华北理工大学 唐山市数据科学重点实验室, 河北 唐山 063210
  • 收稿日期:2022-04-21 修回日期:2022-08-10 接受日期:2022-08-18 发布日期:2022-09-29 出版日期:2023-05-10
  • 通讯作者: 陈学斌
  • 作者简介:翟冉(1998—),女,河北唐山人,硕士研究生,CCF会员,主要研究方向:数据安全、网络安全、隐私保护
    陈学斌(1970—),男,河北唐山人,教授,博士,CCF会员,主要研究方向:数据安全、物联网安全、网络安全 chxb@qq.com
    张国鹏(1996—),男,甘肃武威人,硕士研究生,CCF会员,主要研究方向:网络安全、隐私保护
    裴浪涛(1997—),男,山西运城人,硕士研究生,CCF会员,主要研究方向:数据安全、隐私保护
    马征(1997—),男,河北唐山人,硕士研究生,CCF会员,主要研究方向:数据安全、隐私保护。
  • 基金资助:
    国家自然科学基金资助项目(U20A20179)

Improved K-anonymity privacy protection algorithm based on different sensitivities

Ran ZHAI1,2,3, Xuebin CHEN1,2,3(), Guopeng ZHANG1,2,3, Langtao PEI1,2,3, Zheng MA1,2,3   

  1. 1.College of Sciences,North China University of Science and Technology,Tangshan Hebei 063210,China
    2.Hebei Provincial Key Laboratory of Data Science and Application (North China University of Science and Technology),Tangshan Hebei 063210,China
    3.Tangshan Key Laboratory of Data Science,North China University of Science and Technology,Tangshan Hebei 063210,China
  • Received:2022-04-21 Revised:2022-08-10 Accepted:2022-08-18 Online:2022-09-29 Published:2023-05-10
  • Contact: Xuebin CHEN
  • About author:ZHAI Ran, born in 1998, M. S. candidate. Her research interests include data security, network security, privacy protection.
    CHEN Xuebin, born in 1970, Ph. D., professor. His research interests include data security, internet of things security, network security.
    ZHANG Guopeng, born in 1996, M. S. candidate. His research interests include network security, privacy protection.
    PEI Langtao, born in 1997, M. S. candidate. His research interests include data security, privacy protection.
    MA Zheng, born in 1997, M. S. candidate. His research interests include data security, privacy protection.
  • Supported by:
    National Natural Science Foundation of China(U20A20179)

摘要:

针对机器学习的发展需要大量兼顾数据安全性和可用性的真实数据集的问题,提出一种基于随机森林(RF)的K-匿名隐私保护算法——RFK-匿名隐私保护。首先,使用RF算法预测出每种属性值的敏感程度;然后,使用k-means聚类算法将属性值根据不同敏感程度进行聚类,再使用K-匿名算法根据属性值的敏感程度集群对数据进行不同程度的隐匿;最后,由用户自主地选择需要哪种隐匿程度的数据表。实验结果表明,在Adult数据集中,与K-匿名算法处理过的数据相比,RFK-匿名隐私保护算法处理过的数据在阈值分别为3、4时的准确率分别提高了0.5、1.6个百分点;与(pαk)-匿名算法处理过的数据相比,RFK-匿名隐私保护算法处理过的数据在阈值分别为4、5时的准确率分别提高了0.4、1.9个百分点。RFK-匿名隐私保护算法在保护数据的隐私安全的基础上能有效提高数据的可用性,更适合应用于机器学习中的分类预测。

关键词: 随机森林, K-匿名, 隐私保护, k-means, 聚类算法

Abstract:

To address the problem that the development of machine learning requires a large number of real datasets with both data security and availability, an improved K-anonymity privacy protection algorithm based on Random Forest (RF) was proposed, namely RFK-anonymity privacy protection. Firstly, the sensitivity of each attribute value was predicted by RF algorithm. Secondly, the attribute values were clustered according to different sensitivities by using the k-means clustering algorithm, and the data was hidden to different degrees by using the K-anonymity algorithm according to the sensitivity clusters of attribution. Finally, data tables with different hiding degrees were selected by different users according to their needs. Experimental results show that in Adult datasets,compared with the data processed by K-anonymity algorithm, the accuracies of the data processed by the RFK-anonymity privacy protection algorithm are increased by 0.5 and 1.6 percentage points at thresholds of 3 and 4, respectively; compared with the data processed by (pαk)-anonymity algorithm, the accuracies of the data processed by the proposed algorithm are improved by 0.4 and 1.9 percentage points at thresholds of 4 and 5. It can be seen that RFK-anonymity privacy protection algorithm can effectively improve the availability of data on the basis of protecting the privacy and security of data, and it is more suitable for classification and prediction in machine learning.

Key words: Random Forest (RF), K-anonymity, privacy protection, k-means, clustering algorithm

中图分类号: