基于小波变换的分布式隐私保护聚类算法

doi:10.11772/j.issn.1001-9081.2014.04.1029

计算机应用 ›› 2014, Vol. 34 ›› Issue (4): 1029-1033.DOI: 10.11772/j.issn.1001-9081.2014.04.1029

基于小波变换的分布式隐私保护聚类算法

薛安荣,刘彬,闻丹丹

江苏大学计算机科学与通信工程学院,江苏镇江 212013

收稿日期:2013-09-29 修回日期:2013-11-15 出版日期:2014-04-01 发布日期:2014-04-29
通讯作者: 刘彬
作者简介:薛安荣(1964-),男,江苏镇江人,教授,博士,CCF高级会员,主要研究方向:数据挖掘、机器学习;
刘彬 (1987-),女,山东海阳人,硕士研究生,主要研究方向:数据挖掘;
闻丹丹 (1986-),女,河南商丘人,硕士研究生,主要研究方向:数据挖掘。
基金资助:
国家自然科学基金资助项目

Privacy preserving clustering algorithm based on wavelet transform for distributed data

XUE Anrong,LIU Bin,WEN Dandan

School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang Jiangsu 212013, China

Received:2013-09-29 Revised:2013-11-15 Online:2014-04-01 Published:2014-04-29
Contact: LIU Bin
Supported by:
National Natural Science Foundation

摘要/Abstract

摘要：

针对现有隐私保护聚类算法无法满足效率与隐私之间较好折中的问题,提出一种基于安全多方计算(SMC)与数据扰动相结合的分布式隐私保护聚类算法。各数据方用小波变换实现数据压缩和信息隐藏,并用属性列的随机重排来防止数据重构可能产生的信息泄露。该算法仅使用压缩重排后的数据参与分布聚类计算,因此计算量和通信量小,算法效率高,而多重保护措施有效保护了隐私数据。因小波变换具有高保真性,所以聚类精度受小波变换的影响较小。理论分析和实验结果表明,所提算法安全高效,在处理高维数据时全局F测量值和执行效率优于基于Haar小波的离散余弦变换(DCT-H)算法,解决了效率与隐私之间的折中问题。

Abstract:

The existing privacy preserving clustering data mining algorithms cannot meet better trade-off between efficiency and privacy. To resolve this problem, a distributed privacy preserving clustering algorithm based on Secure Multi-party Computation (SMC) combined with perturbation was proposed. Data owners utilized the wavelet to achieve both data reduction and information hiding, and rearranged the attribute columns randomly to prevent data reconstruction which has potential danger of causing information disclosure. The proposed algorithm reduced computation and communication cost because it only used reduced data in its computation. Thus the efficiency of the algorithm was improved. At the same time, the incorporation of multiple protection measures in the computation effectively preserved data privacy. The clustering accuracy was less affected because of the high dependability of wavelet transform. The theoretical analysis and experimental results indicate that the proposed algorithm is secure and highly effective, and the overall F-measure and the efficiency of the proposed algorithm outperform the DCT-H (Discrete Cosine Transform-Haar) algorithm when dealing with high-dimensional datasets. Above all, it effectively resolves the trade-off issue between efficiency and privacy.

中图分类号:

TP309

薛安荣刘彬闻丹丹. 基于小波变换的分布式隐私保护聚类算法[J]. 计算机应用, 2014, 34(4): 1029-1033.

XUE Anrong LIU Bin WEN Dandan. Privacy preserving clustering algorithm based on wavelet transform for distributed data[J]. Journal of Computer Applications, 2014, 34(4): 1029-1033.

参考文献

［1］ZENG L, LI L, DUAN L, et al.Distributed data mining:A survey ［J］. Information Technology and Management, 2012, 13(4): 403-409.
［2］ZHOU S, LI F, TAO Y, et al.Privacy preservation in database applications: A survey ［J］.Chinese Journal of Computers, 2009, 32(5): 847-861. (周水庚,李丰,陶宇飞,等. 面向数据库应用的隐私保护研究综述［J］. 计算机学报,2009, 32(5): 847-861.)
［3］VAIDYA J, CLIFTON C. Privacy-preserving k-means clustering over vertically partitioned data ［C］// Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2003: 206-215.
［4］DAMGARD I, PASTRO V, SMART N, et al.Multiparty computation from somewhat homomorphic encryption ［C］// CRYPTO 2012: Proceedings of the 32nd Annual Cryptology Conference, LNCS 7417. Berlin: Springer, 2012: 643-662.
［5］ASHAROV G, JAIN A, LOPEZ-ALT A, et al.Multiparty computation with low communication computation and interaction via threshold FHE ［C］// EUROCRYPT 2012: Proceedings of the 31st Annual International Conference on the Theory and Applications of Cryptographic Techniques, LNCS 7237. Berlin: Springer, 2012: 483-501.
［6］ABBASI S, CIMATO S, DAMIANI E. Clustering models in secure clustered multiparty computation ［J］. Journal of Wireless Mobile Networks, 2013, 4(2): 63-76.
［7］KIRAN P, SATHISH K, DR K. A novel framework using elliptic curve cryptography for extremely secure transmission in distributed privacy preserving data mining ［J］. Advanced Computing: An International Journal, 2012, 3(2): 85-92.
［8］MA J, LI F,LI J. Perturbation method for distributed privacy-preserving data mining ［J］. Journal of Zhejiang University: Engineering Science, 2010, 44(2): 276-282. (马进,李锋,李建华. 分布式挖掘中基于扰乱的隐私保护方法［J］.浙江大学学报:工学版,2010,44(2):276-282.)
［9］CLIFTON C, VAIDYA J. Tools for privacy preserving distributed data mining ［J］. SIGKDD Explorations, 2003, 4(2): 28-34.
［10］ZHANG H, HO T B, ZHANG Y, et al.Unsupervised feature extraction for time series clustering using orthogonal wavelet transform ［J］. Journal of Information, 2009, 30(3): 305-319.
［11］SANJAY B, ARYY G. A wavelet-based approach to preserve privacy for classification mining ［J］. Decision Sciences, 2006, 37(4): 623-642.
［12］FUNG B C M, WANG K, WANG L, et al.Privacy preserving data publishing for cluster analysis ［J］. Data and Knowledge Engineering, 2009, 68(6): 552-575.
［13］KADAMPUR M A, SOMAYAJULU D V L N. Privacy preserving technique for Euclidean distance based mining algorithms using a wavelet related transform ［C］// IDEAL 2010: Proceedings of the 11th International Conference on Intelligent Data Engineering and Automated Learning, LNCS 6283. Berlin: Springer, 2010: 202-209.

[1]	张平, 贾亦巧, 王杰昌, 石念峰. 三因子匿名认证与密钥协商协议[J]. 《计算机应用》唯一官方网站, 2021, 41(11): 3281-3287.
[2]	庞晓琼, 杨婷, 陈文俊, 王云婷, 刘天野. 区块链环境下基于秘密共享的数字权限管理方案[J]. 《计算机应用》唯一官方网站, 2021, 41(11): 3257-3265.
[3]	巫光福, 戴子恒. 应对反应攻击的级联中密度准循环奇偶校验码公钥方案[J]. 《计算机应用》唯一官方网站, 2021, 41(11): 3274-3280.
[4]	李莉, 杨鸿飞, 董秀则. 基于身份多条件代理重加密的文件分级访问控制方案[J]. 《计算机应用》唯一官方网站, 2021, 41(11): 3251-3256.
[5]	郭丽峰, 王倩丽. 自适应安全的带关键字搜索的外包属性基加密方案[J]. 《计算机应用》唯一官方网站, 2021, 41(11): 3266-3273.
[6]	孙晓玲, 杨光, 沈焱萍, 杨秋格, 陈涛. 基于可拆分倒排索引的可搜索加密方案[J]. 《计算机应用》唯一官方网站, 2021, 41(11): 3288-3294.
[7]	孙晓玲李姗姗杨光杨秋格. 基于差分表的Blow-CAST-Fish的密钥恢复攻击[J]. 计算机应用, 0, (): 0-0.
[8]	樊缤李智高健. 基于多尺度知识学习的深度鲁棒水印算法[J]. 计算机应用, 0, (): 0-0.
[9]	沈子懿, 王卫亚, 蒋东华, 荣宪伟. 基于Hopfield混沌神经网络和压缩感知的可视化图像加密算法[J]. 计算机应用, 2021, 41(10): 2893-2899.
[10]	巫光福, 王影军. 基于区块链与云-边缘计算混合架构的车联网数据安全存储与共享方案[J]. 计算机应用, 2021, 41(10): 2885-2892.
[11]	高健李智樊缤姜传贤. 基于光线投射采样和四元数正交矩的高效三维医学图像鲁棒零水印算法 [J]. 计算机应用, 0, (): 0-0.
[12]	徐丽云, 闫涛, 钱宇华. 基于级联混沌系统的分数域语音加密算法[J]. 计算机应用, 2021, 41(9): 2623-2630.
[13]	陈恒恒, 倪志伟, 朱旭辉, 金媛媛, 陈千. 基于聚类分析的差分隐私高维数据发布方法[J]. 计算机应用, 2021, 41(9): 2578-2585.
[14]	张永斌, 常文欣, 孙连山, 张航. 基于字典的域名生成算法生成域名的检测方法[J]. 计算机应用, 2021, 41(9): 2609-2614.
[15]	葛纪红, 沈韬. 基于区块链的能源数据访问控制方法[J]. 计算机应用, 2021, 41(9): 2615-2622.

基于小波变换的分布式隐私保护聚类算法

Privacy preserving clustering algorithm based on wavelet transform for distributed data

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics