计算机应用 ›› 2014, Vol. 34 ›› Issue (4): 1029-1033.DOI: 10.11772/j.issn.1001-9081.2014.04.1029

• 计算机安全 • 上一篇    下一篇

基于小波变换的分布式隐私保护聚类算法

薛安荣,刘彬,闻丹丹   

  1. 江苏大学 计算机科学与通信工程学院,江苏 镇江 212013
  • 收稿日期:2013-09-29 修回日期:2013-11-15 出版日期:2014-04-01 发布日期:2014-04-29
  • 通讯作者: 刘彬
  • 作者简介:薛安荣(1964-),男,江苏镇江人,教授,博士,CCF高级会员,主要研究方向:数据挖掘、机器学习;
    刘彬 (1987-),女,山东海阳人,硕士研究生,主要研究方向:数据挖掘;
    闻丹丹 (1986-),女,河南商丘人,硕士研究生,主要研究方向:数据挖掘。
  • 基金资助:

    国家自然科学基金资助项目

Privacy preserving clustering algorithm based on wavelet transform for distributed data

XUE Anrong,LIU Bin,WEN Dandan   

  1. School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang Jiangsu 212013, China
  • Received:2013-09-29 Revised:2013-11-15 Online:2014-04-01 Published:2014-04-29
  • Contact: LIU Bin
  • Supported by:

    National Natural Science Foundation

摘要:

针对现有隐私保护聚类算法无法满足效率与隐私之间较好折中的问题,提出一种基于安全多方计算(SMC)与数据扰动相结合的分布式隐私保护聚类算法。各数据方用小波变换实现数据压缩和信息隐藏,并用属性列的随机重排来防止数据重构可能产生的信息泄露。该算法仅使用压缩重排后的数据参与分布聚类计算,因此计算量和通信量小,算法效率高,而多重保护措施有效保护了隐私数据。因小波变换具有高保真性,所以聚类精度受小波变换的影响较小。理论分析和实验结果表明,所提算法安全高效,在处理高维数据时全局F测量值和执行效率优于基于Haar小波的离散余弦变换(DCT-H)算法,解决了效率与隐私之间的折中问题。

Abstract:

The existing privacy preserving clustering data mining algorithms cannot meet better trade-off between efficiency and privacy. To resolve this problem, a distributed privacy preserving clustering algorithm based on Secure Multi-party Computation (SMC) combined with perturbation was proposed. Data owners utilized the wavelet to achieve both data reduction and information hiding, and rearranged the attribute columns randomly to prevent data reconstruction which has potential danger of causing information disclosure. The proposed algorithm reduced computation and communication cost because it only used reduced data in its computation. Thus the efficiency of the algorithm was improved. At the same time, the incorporation of multiple protection measures in the computation effectively preserved data privacy. The clustering accuracy was less affected because of the high dependability of wavelet transform. The theoretical analysis and experimental results indicate that the proposed algorithm is secure and highly effective, and the overall F-measure and the efficiency of the proposed algorithm outperform the DCT-H (Discrete Cosine Transform-Haar) algorithm when dealing with high-dimensional datasets. Above all, it effectively resolves the trade-off issue between efficiency and privacy.

中图分类号: