《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (2): 413-422.DOI: 10.11772/j.issn.1001-9081.2021122161

• 数据科学与技术 • 上一篇    

基于BCU-Tree与字典的高效用挖掘快速脱敏算法

尹春勇(), 李荧   

  1. 南京信息工程大学 计算机与软件学院,南京 210044
  • 收稿日期:2021-12-27 修回日期:2022-02-19 接受日期:2022-02-23 发布日期:2022-05-16 出版日期:2023-02-10
  • 通讯作者: 尹春勇
  • 作者简介:李荧(1995—),女,江苏淮安人,硕士研究生,主要研究方向:数据挖掘、隐私保护。
  • 基金资助:
    国家自然科学基金资助项目(61772282)

Fast sanitization algorithm based on BCU-Tree and dictionary for high-utility mining

Chunyong YIN(), Ying LI   

  1. School of Computer and Software,Nanjing University of Information Science and Technology,Nanjing Jiangsu 210044,China
  • Received:2021-12-27 Revised:2022-02-19 Accepted:2022-02-23 Online:2022-05-16 Published:2023-02-10
  • Contact: Chunyong YIN
  • About author:LI Ying, born in 1995, M. S. candidate. Her research interests include data mining, privacy preserving.
  • Supported by:
    National Natural Science Foundation of China(61772282)

摘要:

针对隐私保护效用挖掘(PPUM)中脱敏时间长、计算复杂度高,以及算法副作用大等问题,提出一种基于BCU-Tree和字典(BCUTD)的高效用挖掘快速脱敏算法。该算法提出了一种新的树结构BCU-Tree来存储敏感项信息,基于按位运算符编码模型降低树的构建时间并减小搜索空间。采用字典表存储树结构中的所有节点,修改敏感项时只需访问字典表,最终达到数据库脱敏目的。在4个不同的数据集上进行的实验中,BCUTD算法在脱敏时间和副作用上的表现要明显优于经典的优先隐藏高效用项(HHUIF)算法、最大敏感效用-最大项效用(MSU-MAU)算法和使用树与表结构的快速扰动(FPUTT)算法。实验结果表明,BCUTD算法能够有效减少脱敏时间,降低算法副作用以及计算复杂度。

关键词: 敏感信息, 高效用挖掘, 隐私保护, 字典, 位图编码

Abstract:

Privacy Preserving Utility Mining (PPUM) has problems of long sanitization time, high computational complexity, and high side effect. To solve these problems, a fast sanitization algorithm based on BCU-Tree and Dictionary (BCUTD) for high-utility mining was proposed. In the algorithm, a new tree structure called BCU-Tree was presented to store sensitive item information, and based on the bitwise operator coding model, the tree construction time and search space were reduced. The dictionary table was used to store all nodes in the tree structure, and only the dictionary table needed to be accessed when the sensitive item was modified. Finally, the sanitization process was completed. In the experiments on four different datasets, BCUTD algorithm has better performance on sanitization time and high side effect than Hiding High Utility Item First (HHUIF), Maximum Sensitive Utility-MAximum item Utility (MSU-MAU), and Fast Perturbation Using Tree and Table structures (FPUTT). Experimental results show that BCUTD algorithm can effectively speed up the sanitization process, reduce the side effect and computational complexity of the algorithm.

Key words: sensitive information, high-utility mining, privacy preserving, dictionary, bitmap code

中图分类号: