《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (7): 2057-2064.DOI: 10.11772/j.issn.1001-9081.2022091365

• 第39届CCF中国数据库学术会议(NDBC 2022) • 上一篇    

本地化差分隐私下的频繁序列模式挖掘算法PrivSPM

黄硕, 李艳辉(), 曹建秋   

  1. 重庆交通大学 信息科学与工程学院,重庆 400074
  • 收稿日期:2022-09-12 修回日期:2022-11-15 接受日期:2022-11-21 发布日期:2023-07-20 出版日期:2023-07-10
  • 通讯作者: 李艳辉
  • 作者简介:黄硕(1998—),男,河南漯河人,硕士研究生,主要研究方向:数据隐私、差分隐私;
    李艳辉(1989—),女,黑龙江齐齐哈尔人,讲师,博士,主要研究方向:数据隐私、差分隐私、大数据分析;
    曹建秋(1967—),男,湖南益阳人,教授,硕士,主要研究方向:图形图像处理、信息可视化、交通信息化、智能控制。
  • 基金资助:
    国家自然科学基金资助项目(62002036);上海市信息安全综合管理技术研究重点实验室开放课题(AGK2020006);重庆市自然科学基金资助项目(cstc2021jcyj-msxmX0859);重庆市教育委员会科学技术研究项目(KJQN202000707)

PrivSPM: frequent sequential pattern mining algorithm under local differential privacy

Shuo HUANG, Yanhui LI(), Jianqiu CAO   

  1. School of Information Science and Engineering,Chongqing Jiaotong University,Chongqing 400074,China
  • Received:2022-09-12 Revised:2022-11-15 Accepted:2022-11-21 Online:2023-07-20 Published:2023-07-10
  • Contact: Yanhui LI
  • About author:HUANG Shuo, born in 1998, M. S. candidate. His research interests include data privacy, differential privacy.
    LI Yanhui, born in 1989, Ph. D., lecturer. Her research interests include data privacy, differential privacy, big data analysis.
    CAO Jianqiu, born in 1967, M. S., professor. His main research interests include graphics and image processing, information visualization, traffic informatization, intelligent control.
  • Supported by:
    National Natural Science Foundation of China(62002036);Opening Project of Shanghai Key Laboratory of Integrated Administration Technologies for Information Security(AGK2020006);Natural Science Foundation of Chongqing(cstc2021jcyj-msxmX0859);Science and Technology Research Program of Chongqing Municipal Education Commission(KJQN202000707)

摘要:

序列数据中可能包含大量敏感信息,因此直接对序列数据的频繁模式进行挖掘存在泄露用户隐私信息的风险。本地化差分隐私(LDP)能够抵御具有任意背景知识的攻击者,可以对敏感信息提供更全面的保护。序列数据内在序列性和高维度的特点为LDP应用于频繁序列模式挖掘带来了挑战。为解决这个问题,提出一种满足ε-LDP的top-k频繁序列模式挖掘算法PrivSPM。该算法结合填充和采样技术、自适应频率估计算法与频繁项预测技术来构造候选集;基于新域,利用基于指数机制的策略对用户数据进行扰动,并结合频率估计算法识别最终的频繁序列模式。理论分析证明了该算法满足ε-LDP。在3个真实数据集上的实验结果表明,PrivSPM算法在纳真率(TPR)和归一化累积排名(NCR)上明显高于对比算法,能有效提高挖掘结果的准确度。

关键词: 本地化差分隐私, 隐私保护, 频繁序列模式挖掘, 指数机制, 数据挖掘

Abstract:

Sequential data may contain a lot of sensitive information, so that directly mining frequent patterns of sequential data would carry significant risk to privacy of individuals. By resisting attackers with any background knowledge, Local Differential Privacy (LDP) can provide more comprehensive protection for sensitive information. Due to the inherent sequentiality and high-dimensionality, it is challenging to mine frequent sequential patterns with the application of LDP. To tackle this problem, a top-k frequent sequential pattern mining algorithm satisfying ε-LDP, called PrivSPM, was proposed. In this algorithm, filling and sampling technologies, adaptive frequency estimation algorithm and frequent item prediction technology were integrated to construct candidate item. Based on the new domain, an exponential mechanism based strategy was employed to perturb the user data, and the final frequent sequential patterns were identified by combining the frequency estimation algorithm. Theoretical analysis proves that the proposed algorithm satisfies ε-LDP. Experimental results on three real datasets demonstrate that PrivSPM algorithm performs better than the comparison algorithm on True Positive Rate (TPR) and Normalized Cumulative Rank (NCR), and can improve the accuracy of mined results effectively.

Key words: Local Differential Privacy (LDP), privacy protection, frequent sequential pattern mining, exponential mechanism, data mining

中图分类号: