《计算机应用》唯一官方网站 ›› 2020, Vol. 40 ›› Issue (2): 491-496.DOI: 10.11772/j.issn.1001-9081.2019091639

• 第七届CCF大数据学术会议 • 上一篇    下一篇

多维数值型敏感属性数据的个性化隐私保护方法

张梅舒1,2, 徐雅斌1,2,3()   

  1. 1.网络文化与数字传播北京市重点实验室(北京信息科技大学),北京 100101
    2.北京信息科技大学 计算机学院,北京 100101
    3.北京材料基因工程高精尖创新中心 (北京信息科技大学),北京 100101
  • 收稿日期:2019-08-30 修回日期:2019-10-10 接受日期:2019-10-11 发布日期:2019-10-31 出版日期:2020-02-10
  • 通讯作者: 徐雅斌
  • 作者简介:张梅舒(1994—),女,河南周口人,硕士研究生,主要研究方向:大数据隐私保护、量子加密通信;
  • 基金资助:
    国家自然科学基金资助项目(61672101);网络文化与数字传播北京市重点实验室资助项目(ICDDXN004);信息网络安全公安部重点实验室开放课题资助项目(C18601)

Personalized privacy protection method for data with multiple numerical sensitive attributes

Meishu ZHANG1,2, Yabin XU1,2,3()   

  1. 1.Beijing Key Laboratory of Internet Culture and Digital Dissemination Research (Beijing Information Science and Technology University),Beijing 100101,China
    2.School of Computer,Beijing Information Science and Technology University,Beijing 100101,China
    3.Beijing Advanced Innovation Center for Materials Genome Engineering (Beijing Information Science and Technology University),Beijing 100101,China
  • Received:2019-08-30 Revised:2019-10-10 Accepted:2019-10-11 Online:2019-10-31 Published:2020-02-10
  • Contact: Yabin XU
  • About author:ZHANG Meishu, born in 1994, M. S. candidate. Her research interests include big data privacy protection, quantum encryption communication.
  • Supported by:
    the National Natural Science Foundation of China(61672101);the Foundation of Beijing Key Laboratory of Internet Culture and Digital Dissemination Research(ICDDXN004);the Key Lab of Information Network Security, Ministry of Public Security(C18601)

摘要:

为了解决多维数值型敏感属性数据隐私保护方法中存在的准标识符属性信息损失大,以及不能满足用户对数值型敏感属性重要性排序的个性化需求问题,提出一种基于聚类和加权多维桶分组(MSB)的个性化隐私保护方法。首先,根据准标识符的相似程度,将数据集划分成若干准标识符属性值相近的子集;然后,考虑到用户对敏感属性的敏感程度不同,将敏感程度和多维桶的桶容量用于计算加权选择度和构建加权多维桶;最后,依此对数据进行分组和匿名化处理。选用UCI的标准Adult数据集中的8个属性进行实验,并与基于聚类和多维桶的数据隐私保护方法MNSACM和基于聚类和加权多维桶分组的个性化隐私保护方法WMNSAPM进行对比。实验结果表明,所提方法整体较优,并且在减少信息损失和运行时间方面明显优于对比方法,提高了数据质量和运行效率。

关键词: 隐私保护, 多维数值型敏感属性, 聚类, 匿名化, 个性化

Abstract:

The existing privacy protection methods for data with multiple numerical sensitive attributes not only have the problem of large loss of information about quasi-identifier attributes, but also have the problem that they cannot satisfy the user’s personalized need for ranking the importance of numerically sensitive attributes. To solve the above problems, a personalized privacy protection method based on clustering and weighted Multi-Sensitive Bucketization (MSB) was proposed. Firstly, according to the similarity of quasi-identifiers, the dataset was divided into several subsets with similar values of quasi-identifier attributes. Then, considering the different sensitivities of users to sensitive attributes, the sensitivity and the bucket capacity of multi-dimensional buckets were used to calculate the weighted selectivity and to construct the weighted multi-dimensional buckets. Finally, the data were grouped and anonymized according to all above. Eight attributes in UCI’s standard Adult dataset were selected for experiments, and the proposed method was compared with MNSACM and WMNSAPM. Experimental results show that the proposed method is better generally and is significantly superior to the comparison methods in reducing information loss and running time, which improves the data quality and operating efficiency.

Key words: privacy protection, multiple numerical sensitive attribute, clustering, anonymity, personalization

中图分类号: