Journal of Computer Applications

    Next Articles

Secure and efficient frequency estimation method based on shuffled differential privacy

  

  • Received:2024-07-01 Revised:2024-10-14 Online:2024-11-19 Published:2024-11-19
  • Supported by:
    National Nature Science Foundation of China;Nature Science Foundation of Gansu Province

安全高效的混洗差分隐私频率估计方法

晏燕1,李飞飞2,吕雅琴2,冯涛1   

  1. 1. 兰州理工大学
    2. 甘肃省兰州市七里河区兰州理工大学彭家坪校区
  • 通讯作者: 李飞飞
  • 基金资助:
    国家自然科学基金;甘肃省自然科学基金

Abstract: The shuffled differential privacy model is able to balance the degree of privacy protection at user side and the usability of published results at server side. Therefore, it is more suitable for privacy-preserving big data collection and statistical publishing scenarios. Aiming at the problems of low shuffling efficiency and insufficient security of existing shuffled differential privacy frequency estimation methods, a Shuffled Differential Privacy Blind Signature Algorithm (SDPBSA) based on optimized elliptic curve was firstly designed, which achieves identification of tampered or forged information and improves security of shuffling process. Then, a Matrix Column Rearrangement Transposition Shuffling Algorithm (MCRT) was proposed to achieve data shuffling by random matrix column rearrangement and matrix transposition operations, which improves efficiency of shuffling process. Finally, above methods were combined to construct a complete Shuffled Differential Privacy Frequency Estimation Privacy Protection Framework (SM-SDP), and its privacy and error level were analyzed theoretically. Through experiments on datasets such as Normal, Zipf, and Integrated Public Use Microdata Series (IPUMS), it is demonstrated that the MCRT algorithm improves shuffling efficiency by about 1 to 2 orders of magnitude compared to algorithms such as Fisher-Yates, Oblivious Recursive Shuffling (ORShuffle), Message Random Shuffling (MRS), etc. The SM-SDP framework reduces mean square error by about 2 to 11 orders of magnitude in the presence of different proportions of malicious data compared to frequency estimation methods such as Dummy Point with Randomized Response (mixDUMP), Personalized Differential Privacy in Shuffle Model (PSDP), Histogram Publication with Shuffled Differential Privacy (HP-SDP), etc.

Key words: privacy protection, frequency estimation, shuffled differential privacy, blind signature, matrix operation

摘要: 混洗差分隐私模型能够兼顾用户端的隐私保护程度和服务器端发布结果的可用性,因此更适用于隐私保护大数据收集和统计发布场景。针对目前混洗差分隐私频率估计方法洗牌效率较低、混洗过程安全性不足等问题。首先设计基于优化椭圆曲线的混洗差分隐私盲签名算法SDPBSA,实现对篡改或伪造信息的鉴别,提高了混洗过程的安全性;其次提出矩阵列重排转置洗牌算法MCRT,利用随机的矩阵列重排和矩阵转置操作实现数据混洗,提高了混洗过程的效率;最后结合上述方法构建完整的混洗差分隐私频率估计隐私保护框架SM-SDP,理论分析了隐私性和误差级别。在Normal、Zipf和IPUMS等数据集上实验,证明MCRT算法的洗牌效率相较于Fisher-Yates、ORShuffle和MRS等算法提升了约1~2个数量级;SM-SDP框架相较于mixDUMP、PSDP和HP-SDP等频率估计方法,在不同比例恶意数据存在时均方误差降低约2~11个数量级。

关键词: 隐私保护, 频率估计, 混洗差分隐私, 盲签名, 矩阵运算

CLC Number: