Journal of Computer Applications ›› 2018, Vol. 38 ›› Issue (1): 73-78.DOI: 10.11772/j.issn.1001-9081.2017071944

Previous Articles     Next Articles

Improvement of differential privacy protection algorithm based on OPTICS clustering

WANG Hong1,2, GE Lina1,2, WANG Suqing3, WANG Liying1,2, ZHANG Yipeng1,2, LIANG Juncheng4   

  1. 1. College of Information Science and Engineering, Guangxi University for Nationalities, Nanning Guangxi 530006, China;
    2. China-ASEAN Research Center(Guangxi Science Experiment Center), Guangxi University for Nationalities, Nanning Guangxi 530006, China;
    3. Shenzhen Billion Weir Information Technology Limited Liability Company, Shenzhen Guangdong 518000, China;
    4. Guangxi Radio and Television Information Network Limited Liability Company, Nanning Guangxi 530006, China
  • Received:2017-08-07 Revised:2017-09-10 Online:2018-01-10 Published:2018-01-22
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61462009), the Scientific Research Foundation of Guangxi University for Nationalities (2014MDYB029), the China-ASEAN Research Center of Guangxi University for Nationalities (Guangxi Science Experimental Center) 2014 Open Project (TD201404), the Graduate Research and Innovation Project of Guangxi University for Nationalities (gxun-chxps201671).

基于OPTICS聚类的差分隐私保护算法的改进

王红1,2, 葛丽娜1,2, 王苏青3, 王丽颖1,2, 张翼鹏1,2, 梁竣程4   

  1. 1. 广西民族大学 信息科学与工程学院, 南宁 530006;
    2. 广西民族大学 东盟研究中心(广西科学实验中心), 南宁 530006;
    3. 深圳市亿威尔信息技术股份有限公司, 广东 深圳 518000;
    4. 广西广播电视信息网络股份有限公司, 南宁 530006
  • 通讯作者: 葛丽娜
  • 作者简介:王红(1990-),女,河南开封人,硕士,主要研究方向:信息安全;葛丽娜(1969-),女(毛南族),广西环江人,教授,博士,主要研究方向:信息安全;王苏青(1980-),女,广东深圳人,主要研究方向:网络安全;王丽颖(1991-),女,吉林长春人,硕士,主要研究方向:信息安全;张翼鹏(1991-),男(壮族),广西南宁人,硕士,主要研究方向:信息安全;梁竣程(1982-),男,广西南宁人,主要研究方向:信息安全。
  • 基金资助:
    国家自然科学基金资助项目(61462009);广西民族大学科研基金资助项目(2014MDYB029);广西民族大学中国-东盟研究中心(广西科学实验中心)2014年度开放课题项目(TD201404);广西民族大学研究生科研创新资助项目(gxun-chxps201671)。

Abstract: Clustering algorithm is used to preprocess personal privacy information in order to achieve differential privacy protection, which can reduce the reconstruction error caused by directly distributing histogram data, and the reconstruction error caused by different combining methods of histogram. Aiming at the problem of sensitivity to input data parameters in DP-DBSCAN (Differential Privacy-Density-Based Spatial Clustering of Applications with Noise) differential privacy algorithm, the OPTICS (Ordering Points To Identify Clustering Structure) algorithm based on density clustering was applied to differential privacy protection. And an improved differential privacy protection algorithm, called DP-OPTICS (Differential Privacy-Ordering Points To Identify Clustering Structure) was introduced, the sparse dataset was compressed, the same variance noise and different variance noise were used as two noise-adding ways by comparison, considering the probability of privacy information's being broken by the attacker, the upper bound of privacy parameter ε was determined, which effectively balanced the relationship between the privacy of sensitive information and the usability of data. The DP-OPTICS algorithm was compared with the differential privacy protection algorithm based on OPTICS clustering and DP-DBSCAN algorithm. The DP-OPTICS algorithm is between the other two in time consumption. However, in the case of having the same parameters, the stability of the DP-OPTICS algorithm is the best among them, so the improved OP-OPTICS differential privacy protection algorithm is generally feasible.

Key words: clustering algorithm, personal privacy, reconstruction error, differential privacy protection, Ordering Points To Identify Clustering Structure (OPTICS) algorithm

摘要: 采用聚类算法预先处理个人隐私信息实现差分隐私保护,能够减少直接发布直方图数据带来的噪声累积现象,同时减小了直方图因合并方式不同带来的重构误差。针对DP-DBSCAN差分隐私算法存在对数据参数输入敏感问题,将基于密度聚类的OPTICS算法应用于差分隐私保护中,并提出改进的DP-OPTICS差分隐私保护算法,对稀疏型数据集进行压缩处理,对比采用同方差噪声和异方差噪声两种添加噪声方式,考虑攻击者能够攻破隐私信息的概率,确定隐私参数ε的上界,有效平衡了敏感信息的隐私性和数据的可用性之间的关系。将DP-OPTICS算法和基于OPTICS聚类的差分隐私保护算法、DP-DBSCAN算法进行对比,DP-OPTICS算法在时间消耗上介于其余二者之间,但是在取得相同参数的情况下,聚类的稳定性在三者中最好,因此改进后OP-OPTICS差分隐私保护算法总体上是可行的。

关键词: 聚类算法, 个人隐私, 重构误差, 差分隐私保护, OPTICS算法

CLC Number: