Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (10): 3114-3121.DOI: 10.11772/j.issn.1001-9081.2023101520

• Cyber security • Previous Articles     Next Articles

Review of histogram publication methods based on differential privacy

Xuebin CHEN1,2,3(), Liyang SHAN1,2,3, Rumin GUO1,2,3   

  1. 1.College of Sciences,North China University of Science and Technology,Tangshan Hebei 063210,China
    2.Hebei Provincial Key Laboratory of Data Science and Application (North China University of Science and Technology),Tangshan Hebei 063210,China
    3.Tangshan Key Laboratory of Data Science,North China University of Science and Technology,Tangshan Hebei 063210,China
  • Received:2023-11-07 Revised:2024-01-02 Accepted:2024-01-04 Online:2024-01-19 Published:2024-10-10
  • Contact: Xuebin CHEN
  • About author:SHAN Liyang, born in 1997, M. S. candidate. Her research interests include data security, privacy protection.
    GUO Rumin, born in 1998, M. S. candidate. Her research interests include data security, privacy protection.
  • Supported by:
    National Natural Science Foundation of China(U20A20179)

基于差分隐私的直方图发布方法综述

陈学斌1,2,3(), 单丽洋1,2,3, 郭如敏1,2,3   

  1. 1.华北理工大学 理学院, 河北 唐山 063210
    2.河北省数据科学与应用重点实验室(华北理工大学), 河北 唐山 063210
    3.华北理工大学 唐山市数据科学重点实验室, 河北 唐山 063210
  • 通讯作者: 陈学斌
  • 作者简介:陈学斌(1970—),男,河北唐山人,教授,博士,CCF杰出会员,主要研究方向:大数据安全、物联网安全、网络安全 chxb@ncst.edu.cn
    单丽洋(1997—),女,河北保定人,硕士研究生,CCF会员,主要研究方向:数据安全、隐私保护
    郭如敏(1998—),女,山西大同人,硕士研究生,CCF会员,主要研究方向:数据安全、隐私保护。
  • 基金资助:
    国家自然科学基金资助项目(U20A20179)

Abstract:

In the era of digital economy, data publication plays a crucial role in data sharing. Histogram data publication is a common method for data publication. However, histogram data publication faces privacy leakage issues. To address this concern, research has been conducted on histogram data publication methods based on Differential Privacy (DP). Firstly, a brief description of DP and histogram properties, as well as the research on histogram publication methods for both static datasets and streaming data in the past five years both at home and abroad, was provided, and the balance among the grouping number and types of histograms, noise and grouping errors in static data, as well as privacy budget allocation problem, were discussed. Secondly, the issues of data sampling, data prediction, and sliding windows for dynamic data grouping were explored. Additionally, for the DP histogram publication methods oriented to interval tree structures were investigated, the original data was transformed into tree structures, and the discussions about tree-structured data noise addition, tree-structure based optimization, and privacy budget allocation for tree structures were conducted. Moreover, the feasibility and privacy aspects of published histogram data, as well as the issues of query range and accuracy of published histogram data, were discussed. Finally, comparative analysis was conducted on relevant algorithms and their advantages and disadvantages were summarized, quantitative analysis and applicable scenarios for some algorithms were provided, and the future research directions of DP-based histograms in various data scenarios were prospected.

Key words: data publication, histogram publication, differential privacy, privacy budget, Mean Squared Error (MSE)

摘要:

在数字经济时代,数据发布是数据共享的重要环节。直方图数据发布是数据发布的常见方式,但它面临着隐私泄露的问题。为此,对基于差分隐私(DP)的直方图数据发布方法进行了研究。首先,介绍了DP和直方图的相关性质,以及近5年国内外针对静态数据集和流数据的直方图发布方法的研究,并讨论了静态数据下直方图分组数、分组方式、噪声误差和分组误差的均衡,以及隐私预算分配问题。其次,探讨了动态数据下数据采样、数据预测以及滑动窗口实现分组的问题;同时针对面向区间树结构的DP直方图发布方法,将原始数据与树结构进行转化,并讨论了树结构数据的加噪、基于树结构的优化、树结构的隐私预算的分配等;此外,还讨论了直方图发布数据的可用性和隐私性、查询范围和查询精度问题。最后,通过对相关算法进行对比分析,总结了各算法的优缺点,以及部分算法的定量分析比较及适用场景,展望了未来基于DP的直方图在不同数据场景中的研究方向。

关键词: 数据发布, 直方图发布, 差分隐私, 隐私预算, 均方误差

CLC Number: