基于差分隐私的频繁序列模式挖掘算法

doi:10.11772/j.issn.1001-9081.2017.02.0316

计算机应用 ›› 2017, Vol. 37 ›› Issue (2): 316-321.DOI: 10.11772/j.issn.1001-9081.2017.02.0316

• 第33届中国数据库学术会议（NDBC 2016） • 上一篇下一篇

基于差分隐私的频繁序列模式挖掘算法

李艳辉, 刘浩, 袁野, 王国仁

东北大学计算机科学与工程学院, 沈阳 110819

收稿日期:2016-08-12 修回日期:2016-09-10 出版日期:2017-02-10 发布日期:2017-02-11
通讯作者: 李艳辉,lyhneu506822328@163.com
作者简介:李艳辉(1989-),女,黑龙江齐齐哈尔人,博士研究生,主要研究方向:数据隐私、数据查询处理;刘浩(1991-),男,湖北随州人,硕士研究生,主要研究方向:隐私保护、数据挖掘;袁野(1981-),男,辽宁沈阳人,教授,博士,CCF会员,主要研究方向:云计算、大数据管理;王国仁(1966-),男,湖北咸宁人,教授,博士,CCF会员,主要研究方向:不确定数据管理、图数据管理、众包数据管理。
基金资助:
国家自然科学基金资助项目（61033007，61622202，61572119）；国家973计划项目（2012CB316201）；教育部中央高校基本科研业务费资助项目（N150402005）。

Frequent sequence pattern mining with differential privacy

LI Yanhui, LIU Hao, YUAN Ye, WANG Guoren

School of Computer Science and Engineering, Northeastern University, Shenyang Liaoning 110819, China

Received:2016-08-12 Revised:2016-09-10 Online:2017-02-10 Published:2017-02-11
Supported by:
This work is partially supported by the National Natural Science Foundation of China (61033007, 61622202, 61572119), the National Program on Key Basic Research Project (973 Program) (2012CB316201), the Fundamental Research Funds for the Central Universities (N150402005).

摘要/Abstract

摘要：

针对当数据集含有敏感信息时，直接发布频繁序列模式本身及其支持度计数都有可能泄露用户隐私信息的问题，提出一种满足差分隐私（DP）的频繁序列模式挖掘（DP-FSM）算法。该算法利用向下封闭性质生成候选序列模式集，基于智能截断方法从候选模式中挑选出频繁的序列模式，最后采用几何机制对所选出模式的真实支持度添加噪声进行扰动。另外，为了提高挖掘结果的可用性，设计了一个阈值修正的策略来减小挖掘过程中的截断误差和传播误差。理论分析证明了该算法满足ε-差分隐私。实验结果表明了该算法在拒真率（FNR）和相对支持度误差（RSE）两个指标上明显低于对比算法PFS²，有效地提高了挖掘结果的准确度。

关键词: 频繁序列挖掘, 差分隐私, 隐私保护, 几何机制, 数据挖掘

Abstract:

Focusing on the issue that releasing frequent sequence patterns and the corresponding true supports may reveal the individuals' privacy when the data set contains sensitive information, a Differential Private Frequent Sequence Mining (DP-FSM) algorithm was proposed. Downward closure property was used to generate a candidate set of sequence patterns, smart truncating based technique was used to sample frequent patterns in the candidate set, and geometric mechanism was utilized to perturb the true supports of each sampled pattern. In addition, to improve the usability of the results, a threshold modification method was proposed to reduce truncation error and propagation error in mining process. The theoretical analysis show that the proposed method is ε-differentially private. The experimental results demonstrate that the proposed method has lower False Negative Rate (FNR) and Relative Support Error (RSE) than that of the comparison algorithm named PFS², thus effectively improving the accuracy of mining results.

Key words: frequent sequence mining, Differential Privacy (DP), privacy protection, geometric mechanism, data mining

中图分类号:

李艳辉, 刘浩, 袁野, 王国仁. 基于差分隐私的频繁序列模式挖掘算法[J]. 计算机应用, 2017, 37(2): 316-321.

LI Yanhui, LIU Hao, YUAN Ye, WANG Guoren. Frequent sequence pattern mining with differential privacy[J]. Journal of Computer Applications, 2017, 37(2): 316-321.

参考文献

[1] SWEENEY L. k-Anonymity:a model for protecting privacy[J]. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 2002, 10(5):557-570.
[2] DWORK C. Differential privacy:a survey of results[C]//TAMC 2008:Proceedings of the 5th International Conference on Theory and Applications of Models of Computation, LNCS 4978. Berlin:Springer-Verlag, 2006:1-19.
[3] CHEN R, FUNG B C M, DESAI B C, et al. Differentially private transit data publication:a case study on the montreal transportation system[C]//KDD'12:Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York:ACM, 2012:213-221.
[4] CHEN R, ACS G, CASTELLUCCIA C. Differentially private sequential data publication via variable-length n-grams[C]//CCS'12:Proceedings of the 7th ACM CCS Conference on Computer and Communications Security. New York:ACM, 2012:638-649.
[5] BONOMI L, XIONG L. A two-phase algorithm for mining sequential patterns with differential privacy[C]//CIKM'13:Proceedings of the 22nd ACM International Conference on Conference on Information & Knowledge Management. New York:ACM, 2013:269-278.
[6] XU S, SU S, CHENG X, et al. Differentially private frequent sequence mining via sampling-based candidate pruning[C]//ICDE'15:Proceedings of the 31st IEEE International Conference on Data Engineering. Washington, DC:IEEE Computer Society, 2015:1035-1046.
[7] BHASKAR R, LAXMAN S, SMITH A, et al. Discovering frequent patterns in sensitive data[C]//KDD'10:Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York:ACM, 2010:503-512.
[8] LI N, QARDAJI W, SU D, et al. PrivBasis:frequent itemset mining with differential privacy[J]. Proceedings of the VLDB Endowment, 2012, 5(11):1340-1351.
[9] 张啸剑,王淼,孟小峰.差分隐私保护下一种精确挖掘 top-k 频繁模式方法[J].计算机研究与发展,2014,51(1):104-114. (ZHANG X J, WANG M, MENG X F. An accurate method for mining top-k frequent pattern under differential privacy[J]. Journal of Computer Research and Development, 2014, 51(1):104-114).
[10] ZENG C, NAUGHTON J F, CAI J-Y. On differentially private frequent itemset mining[J]. Proceedings of the VLDB Endowment, 2012, 6(1):25-36.
[11] CHEN R, MOHAMMED N, FUNG B C M, et al. Publishing set-valued data via differential privacy[C]//Proceedings of the VLDB Endowment, 2011, 4(11):1087-1098.
[12] LEE J, CLIFTONC W. Top-k frequent itemsets via differentially private FP-trees[C]//KDD'14:Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York:ACM, 2014:930-940.
[13] DWORK C, McSHERRY F, NISSIM K. Calibrating noise to sensitivity in private data analysis[C]//TCC 2006:Proceeding of the Third Theory of Cryptography Conferenc, LNCS 3876. Berlin:Springer-Verlag, 2006:265-284.
[14] GHOSH A, ROUGHGARDEN T, SUNDARARAJAN M. Universally utility-maximizing privacy mechanisms[C]//STOC'09:Proceedings of the 41th ACM STOC Annual Symposium on Theory of Computing. New York:ACM, 2009:351-360.
[15] AGRAWAL R, SRIKANT R. Fast algorithms for mining association rules[C]//VLDB'94:Proceedings of the 20th Conference of Very Large Data Bases. San Francisco, CA:Morgan Kaufmann Publishers, 1994:487-499.
[16] ZHANG C, HAN J, SHOU L, et al. Splitter:mining fine-grained sequential patterns in semantic trajectories[J]. Proceedings of the VLDB Endowment, 2014, 7(9):769-780.

基于差分隐私的频繁序列模式挖掘算法

Frequent sequence pattern mining with differential privacy

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	陈恒恒, 倪志伟, 朱旭辉, 金媛媛, 陈千. 基于聚类分析的差分隐私高维数据发布方法[J]. 计算机应用, 2021, 41(9): 2578-2585.
[2]	李卓, 宋子晖, 沈鑫, 陈昕. 边缘计算支持下的移动群智感知本地差分隐私保护机制[J]. 计算机应用, 2021, 41(9): 2678-2686.
[3]	刘世泽, 秦艳君, 王晨星, 苏琳, 柯其学, 罗海勇, 孙艺, 王宝会. 基于深度残差长短记忆网络交通流量预测算法[J]. 计算机应用, 2021, 41(6): 1566-1572.
[4]	李旭娟, 皮建勇, 黄飞翔, 贾海朋. 基于自生成深度神经网络的4D航迹预测[J]. 计算机应用, 2021, 41(5): 1492-1499.
[5]	秦静, 安雯, 季长清, 汪祖民. 无线体域网隐私保护机制研究综述[J]. 计算机应用, 2021, 41(4): 970-975.
[6]	刘向宇, 夏国平, 夏秀峰, 宗传玉, 朱睿, 李佳佳. 个性化时空数据隐私保护[J]. 计算机应用, 2021, 41(3): 643-650.
[7]	张恩, 李会敏, 常键. 可验证的隐私保护k-means聚类方案[J]. 计算机应用, 2021, 41(2): 413-421.
[8]	陈凯, 于彦伟, 赵金东, 宋鹏. 基于城市交通监控大数据的工作位置推理方法[J]. 计算机应用, 2021, 41(1): 177-184.
[9]	陈锦宇, 刘兆伟. 基于改进投票证明共识协议的车联网系统[J]. 计算机应用, 2021, 41(1): 170-176.
[10]	马敏耀, 刘卓, 徐艺, 吴恋. 隐私保护整数区间位置关系判定问题[J]. 计算机应用, 2020, 40(9): 2657-2664.
[11]	吕佳玉, 竺智荣, 姚志强. 云计算环境下的双通道数据动态加密策略[J]. 计算机应用, 2020, 40(8): 2268-2273.
[12]	金波, 张志勇, 赵婷. 基于差分隐私的社交网络位置近邻查询方法[J]. 计算机应用, 2020, 40(8): 2340-2344.
[13]	马敏耀, 吴恋, 刘卓, 徐艺. 隐私保护整数点和区间关系判定问题[J]. 计算机应用, 2020, 40(7): 1983-1988.
[14]	龙洋洋, 陈玉玲, 辛阳, 豆慧. 基于联盟区块链的安全能源交易方案[J]. 计算机应用, 2020, 40(6): 1668-1673.
[15]	涂子璇, 刘树波, 熊星星, 赵晶, 蔡朝晖. 可穿戴设备的数值型流数据差分隐私均值发布[J]. 计算机应用, 2020, 40(6): 1692-1697.