Journal of Computer Applications ›› 2020, Vol. 40 ›› Issue (2): 491-496.DOI: 10.11772/j.issn.1001-9081.2019091639

• CCF Bigdata 2019 • Previous Articles     Next Articles

Personalized privacy protection method for data with multiple numerical sensitive attributes

Meishu ZHANG1,2, Yabin XU1,2,3()   

  1. 1.Beijing Key Laboratory of Internet Culture and Digital Dissemination Research (Beijing Information Science and Technology University),Beijing 100101,China
    2.School of Computer,Beijing Information Science and Technology University,Beijing 100101,China
    3.Beijing Advanced Innovation Center for Materials Genome Engineering (Beijing Information Science and Technology University),Beijing 100101,China
  • Received:2019-08-30 Revised:2019-10-10 Accepted:2019-10-11 Online:2019-10-31 Published:2020-02-10
  • Contact: Yabin XU
  • About author:ZHANG Meishu, born in 1994, M. S. candidate. Her research interests include big data privacy protection, quantum encryption communication.
  • Supported by:
    the National Natural Science Foundation of China(61672101);the Foundation of Beijing Key Laboratory of Internet Culture and Digital Dissemination Research(ICDDXN004);the Key Lab of Information Network Security, Ministry of Public Security(C18601)


张梅舒1,2, 徐雅斌1,2,3()   

  1. 1.网络文化与数字传播北京市重点实验室(北京信息科技大学),北京 100101
    2.北京信息科技大学 计算机学院,北京 100101
    3.北京材料基因工程高精尖创新中心 (北京信息科技大学),北京 100101
  • 通讯作者: 徐雅斌
  • 作者简介:张梅舒(1994—),女,河南周口人,硕士研究生,主要研究方向:大数据隐私保护、量子加密通信;
  • 基金资助:


The existing privacy protection methods for data with multiple numerical sensitive attributes not only have the problem of large loss of information about quasi-identifier attributes, but also have the problem that they cannot satisfy the user’s personalized need for ranking the importance of numerically sensitive attributes. To solve the above problems, a personalized privacy protection method based on clustering and weighted Multi-Sensitive Bucketization (MSB) was proposed. Firstly, according to the similarity of quasi-identifiers, the dataset was divided into several subsets with similar values of quasi-identifier attributes. Then, considering the different sensitivities of users to sensitive attributes, the sensitivity and the bucket capacity of multi-dimensional buckets were used to calculate the weighted selectivity and to construct the weighted multi-dimensional buckets. Finally, the data were grouped and anonymized according to all above. Eight attributes in UCI’s standard Adult dataset were selected for experiments, and the proposed method was compared with MNSACM and WMNSAPM. Experimental results show that the proposed method is better generally and is significantly superior to the comparison methods in reducing information loss and running time, which improves the data quality and operating efficiency.

Key words: privacy protection, multiple numerical sensitive attribute, clustering, anonymity, personalization



关键词: 隐私保护, 多维数值型敏感属性, 聚类, 匿名化, 个性化

CLC Number: