计算机应用 ›› 2017, Vol. 37 ›› Issue (2): 316-321.DOI: 10.11772/j.issn.1001-9081.2017.02.0316

• 第33届中国数据库学术会议(NDBC 2016) • 上一篇    下一篇

基于差分隐私的频繁序列模式挖掘算法

李艳辉, 刘浩, 袁野, 王国仁   

  1. 东北大学 计算机科学与工程学院, 沈阳 110819
  • 收稿日期:2016-08-12 修回日期:2016-09-10 出版日期:2017-02-10 发布日期:2017-02-11
  • 通讯作者: 李艳辉,lyhneu506822328@163.com
  • 作者简介:李艳辉(1989-),女,黑龙江齐齐哈尔人,博士研究生,主要研究方向:数据隐私、数据查询处理;刘浩(1991-),男,湖北随州人,硕士研究生,主要研究方向:隐私保护、数据挖掘;袁野(1981-),男,辽宁沈阳人,教授,博士,CCF会员,主要研究方向:云计算、大数据管理;王国仁(1966-),男,湖北咸宁人,教授,博士,CCF会员,主要研究方向:不确定数据管理、图数据管理、众包数据管理。
  • 基金资助:

    国家自然科学基金资助项目(61033007,61622202,61572119);国家973计划项目(2012CB316201);教育部中央高校基本科研业务费资助项目(N150402005)。

Frequent sequence pattern mining with differential privacy

LI Yanhui, LIU Hao, YUAN Ye, WANG Guoren   

  1. School of Computer Science and Engineering, Northeastern University, Shenyang Liaoning 110819, China
  • Received:2016-08-12 Revised:2016-09-10 Online:2017-02-10 Published:2017-02-11
  • Supported by:

    This work is partially supported by the National Natural Science Foundation of China (61033007, 61622202, 61572119), the National Program on Key Basic Research Project (973 Program) (2012CB316201), the Fundamental Research Funds for the Central Universities (N150402005).

摘要:

针对当数据集含有敏感信息时,直接发布频繁序列模式本身及其支持度计数都有可能泄露用户隐私信息的问题,提出一种满足差分隐私(DP)的频繁序列模式挖掘(DP-FSM)算法。该算法利用向下封闭性质生成候选序列模式集,基于智能截断方法从候选模式中挑选出频繁的序列模式,最后采用几何机制对所选出模式的真实支持度添加噪声进行扰动。另外,为了提高挖掘结果的可用性,设计了一个阈值修正的策略来减小挖掘过程中的截断误差和传播误差。理论分析证明了该算法满足ε-差分隐私。实验结果表明了该算法在拒真率(FNR)和相对支持度误差(RSE)两个指标上明显低于对比算法PFS2,有效地提高了挖掘结果的准确度。

关键词: 频繁序列挖掘, 差分隐私, 隐私保护, 几何机制, 数据挖掘

Abstract:

Focusing on the issue that releasing frequent sequence patterns and the corresponding true supports may reveal the individuals' privacy when the data set contains sensitive information, a Differential Private Frequent Sequence Mining (DP-FSM) algorithm was proposed. Downward closure property was used to generate a candidate set of sequence patterns, smart truncating based technique was used to sample frequent patterns in the candidate set, and geometric mechanism was utilized to perturb the true supports of each sampled pattern. In addition, to improve the usability of the results, a threshold modification method was proposed to reduce truncation error and propagation error in mining process. The theoretical analysis show that the proposed method is ε-differentially private. The experimental results demonstrate that the proposed method has lower False Negative Rate (FNR) and Relative Support Error (RSE) than that of the comparison algorithm named PFS2, thus effectively improving the accuracy of mining results.

Key words: frequent sequence mining, Differential Privacy (DP), privacy protection, geometric mechanism, data mining

中图分类号: