安全高效的混洗差分隐私频率估计方法

doi:10.11772/j.issn.1001-9081.2024070911

《计算机应用》唯一官方网站 ›› 2025, Vol. 45 ›› Issue (8): 2600-2611.DOI: 10.11772/j.issn.1001-9081.2024070911

• 网络空间安全 • 上一篇

安全高效的混洗差分隐私频率估计方法

晏燕, 李飞飞(), 吕雅琴, 冯涛

兰州理工大学计算机与通信学院，兰州 730050

收稿日期:2024-06-30 修回日期:2024-10-14 接受日期:2024-10-16 发布日期:2024-11-19 出版日期:2025-08-10
通讯作者: 李飞飞
作者简介:晏燕（1980—），女，甘肃兰州人，教授，博士，CCF高级会员，主要研究方向：隐私保护、信息安全
吕雅琴（2000—），女，山西晋中人，硕士研究生，主要研究方向：位置隐私保护
冯涛（1970—），男，甘肃定西人，研究员，博士，主要研究方向：现代密码学理论、信息安全。
基金资助:
国家自然科学基金资助项目(62361036);甘肃省自然科学基金资助项目(22JR5RA279)

Secure and efficient frequency estimation method based on shuffled differential privacy

Yan YAN, Feifei LI(), Yaqin LYU, Tao FENG

School of Computer and Communication，Lanzhou University of Technology，Lanzhou Gansu 730050，China

Received:2024-06-30 Revised:2024-10-14 Accepted:2024-10-16 Online:2024-11-19 Published:2025-08-10
Contact: Feifei LI
About author:YAN Yan， born in 1980， Ph. D.， professor. Her research interests include privacy protection， information security.
LYU Yaqin， born in 2000， M. S. candidate. Her research interests include location privacy protection.
FENG Tao， born in 1970， Ph. D.， research fellow. His research interests include modern cryptography theory， information security.
Supported by:
National Natural Science Foundation of China(62361036);Natural Science Foundation of Gansu Province(22JR5RA279)

摘要/Abstract

摘要：

混洗差分隐私（SDP）模型能兼顾用户端的隐私保护程度和服务器端发布结果的可用性，更适用于隐私保护的大数据收集和统计发布场景。针对目前SDP频率估计方法的洗牌效率较低和混洗过程安全性不足等问题，进行以下工作：首先，设计基于优化椭圆曲线的混洗差分隐私盲签名算法（SDPBSA），以实现对篡改或伪造信息的鉴别，提高混洗过程的安全性；其次，提出矩阵列重排转置（MCRT）洗牌方法，以利用随机的矩阵列重排和矩阵转置操作实现数据混洗，提高混洗过程的效率；最后，结合上述方法构建完整的SDP频率估计隐私保护框架——SM-SDP （SDP based on blind Signature and Matrix column rearrangement transposition），并通过理论分析讨论它的隐私性和误差级别。在Normal、Zipf和IPUMS （Integrated Public Use Microdata Series）等数据集上的实验结果表明，相较于Fisher-Yates、ORShuffle （Oblivious Recursive Shuffling）和MRS （Message Random Shuffling）等洗牌方法， MCRT洗牌方法的洗牌效率提升了1~2个数量级；相较于mixDUMP、PSDP （Personalized Differential Privacy in Shuffle model）和HP-SDP （Histogram Publication with SDP）等频率估计方法， SM-SDP框架在不同比例恶意数据存在时的均方误差（MSE）降低了2~11个数量级。

关键词: 隐私保护, 频率估计, 混洗差分隐私, 盲签名, 矩阵运算

Abstract:

Shuffled Differential Privacy （SDP） models can balance the degree of privacy protection at user side and the usability of published results at server side. Therefore， they are more suitable for privacy-preserving big data collection and statistical publishing scenarios. Aiming at the problems of low shuffling efficiency and insufficient shuffling process security of the existing SDP frequency estimation methods， the following work was performed： firstly， an SDP Blind Signature Algorithm （SDPBSA） was designed on the basis of optimized elliptic curve to achieve discrimination of tampered or forged information， thereby improving the security of shuffling process. Then， a Matrix Column Rearrangement Transposition （MCRT） shuffling method was proposed to realize data shuffling by random matrix column rearrangement and matrix transposition operations， thereby improving the efficiency of shuffling process. Finally， above methods were combined to construct a complete SDP frequency estimation privacy protection framework — SM-SDP （SDP based on blind Signature and Matrix column rearrangement transposition）， and its privacy and error level were analyzed theoretically. Experimental results on datasets such as Normal， Zipf， and IPUMS （Integrated Public Use Microdata Series） demonstrate that the MCRT shuffling method improves the shuffling efficiency by about 1 to 2 orders of magnitude compared to shuffling methods such as Fisher-Yates， ORShuffle （Oblivious Recursive Shuffling）， and MRS （Message Random Shuffling）； SM-SDP framework reduces the Mean Squared Error （MSE） by 2 to 11 orders of magnitude in the presence of different proportions of malicious data compared to frequency estimation methods such as mixDUMP， PSDP （Personalized Differential Privacy in Shuffle model）， and HP-SDP （Histogram Publication with SDP）.

Key words: privacy protection, frequency estimation, Shuffled Differential Privacy (SDP), blind signature, matrix operation

中图分类号:

TP309

晏燕, 李飞飞, 吕雅琴, 冯涛. 安全高效的混洗差分隐私频率估计方法[J]. 计算机应用, 2025, 45(8): 2600-2611.

Yan YAN, Feifei LI, Yaqin LYU, Tao FENG. Secure and efficient frequency estimation method based on shuffled differential privacy[J]. Journal of Computer Applications, 2025, 45(8): 2600-2611.

图/表 14

图1 基于混洗差分隐私的频率估计隐私保护框架

Fig. 1 Privacy protection framework for frequency estimation based on shuffled differential privacy

图2 SDPBSA的流程

Fig. 2 Flow of SDPBSA

图3 MCRT方法示例图

Fig. 3 Example diagram of MCRT method

表1 实验数据集信息

Tab. 1 Experimental dataset information

数据集	用户数	用户数据最大取值
Normal	600 000	600
Zipf	600 000	600
IPUMS	602 325	915
Kosarak	990 002	41 270

表2 各种洗牌方法的时间复杂度

Tab. 2 Time complexity of various shuffling methods

洗牌方法	时间复杂度
Fisher-Yates	$O (n 2)$
ORShuffle	$O (n (l o g n) 2)$
MRS	$O (n l o g n)$
MCRT	$O (n)$

表2 各种洗牌方法的时间复杂度

Tab. 2 Time complexity of various shuffling methods

洗牌方法	时间复杂度
Fisher-Yates	$O (n 2)$
ORShuffle	$O (n (l o g n) 2)$
MRS	$O (n l o g n)$
MCRT	$O (n)$

表3 各种洗牌方法的运行时间比较 ( s)

Tab. 3 Comparison of running time of various shuffling methods

洗牌方法	Normal	Zipf	IPUMS	Kosarak
Fisher-Yates	55.599 681	56.246 354	65.479 238	539.611 728
ORShuffle	38.012 453	37.446 202	47.264 105	76.035 698
MRS	10.392 418	9.536 152	19.193 321	19.062 021
MCRT	0.445 108	0.466 110	0.516 073	1.508 853

图4 不同方法在Normal数据集上的MSE对比

Fig. 4 MSE comparison of different methods on Normal dataset

图5 不同方法在Zipf数据集上的MSE对比

Fig. 5 MSE comparison of different methods on Zipf dataset

图6 不同方法在IPUMS数据集上的MSE对比

Fig. 6 MSE comparison of different methods on IPUMS dataset

图7 不同方法在Kosarak数据集上的MSE对比

Fig. 7 MSE comparison of different methods on Kosarak dataset

表4 Normal数据集上恶意数据存在时不同方法的MSE对比

Tab. 4 MSE comparison of different methods on Normal dataset when malicious data exist

方法	恶意数据比例/%	隐私预算
方法	恶意数据比例/%	$ε = 0.2$	$ε = 0.4$	$ε = 0.6$	$ε = 0.8$	$ε = 1.0$
mixDUMP	10	$7.82 × 10 - 3$	$7.63 × 10 - 3$	$7.51 × 10 - 3$	$7.12 × 10 - 3$	$7.04 × 10 - 3$
	20	$2.41 × 10 - 2$	$2.32 × 10 - 2$	$1.83 × 10 - 2$	$1.62 × 10 - 2$	$1.51 × 10 - 2$
	40	$6.48 × 10 - 2$	$6.32 × 10 - 2$	$6.03 × 10 - 2$	$5.91 × 10 - 2$	$5.57 × 10 - 2$
PSDP	10	$5.39 × 10 - 3$	$5.24 × 10 - 3$	$5.21 × 10 - 3$	$5.16 × 10 - 3$	$4.88 × 10 - 3$
	20	$9.72 × 10 - 3$	$9.50 × 10 - 3$	$9.23 × 10 - 3$	$8.68 × 10 - 3$	$8.47 × 10 - 3$
	40	$3.60 × 10 - 2$	$3.46 × 10 - 2$	$3.37 × 10 - 2$	$3.11 × 10 - 2$	$2.85 × 10 - 2$
HP-SDP	10	$5.53 × 10 - 5$	$5.20 × 10 - 5$	$5.03 × 10 - 5$	$4.75 × 10 - 5$	$4.58 × 10 - 5$
	20	$8.35 × 10 - 5$	$8.14 × 10 - 5$	$7.91 × 10 - 5$	$7.42 × 10 - 5$	$7.13 × 10 - 5$
	40	$5.65 × 10 - 4$	$5.54 × 10 - 4$	$5.38 × 10 - 4$	$4.95 × 10 - 4$	$4.63 × 10 - 4$
OD-HP	10	$4.49 × 10 - 5$	$4.23 × 10 - 5$	$4.10 × 10 - 5$	$3.85 × 10 - 5$	$3.57 × 10 - 5$
	20	$6.81 × 10 - 5$	$6.72 × 10 - 5$	$6.50 × 10 - 5$	$6.21 × 10 - 5$	$6.04 × 10 - 5$
	40	$3.94 × 10 - 4$	$3.62 × 10 - 4$	$3.55 × 10 - 4$	$3.27 × 10 - 4$	$3.10 × 10 - 4$
Laplace	10	$2.32 × 10 - 6$	$2.18 × 10 - 6$	$2.02 × 10 - 6$	$1.86 × 10 - 6$	$1.71 × 10 - 6$
	20	$4.38 × 10 - 6$	$4.26 × 10 - 6$	$4.18 × 10 - 6$	$3.93 × 10 - 6$	$3.81 × 10 - 6$
	40	$1.23 × 10 - 5$	$1.08 × 10 - 5$	$9.95 × 10 - 6$	$9.77 × 10 - 6$	$9.61 × 10 - 6$
SM-SDP	10	$6.54 × 10 - 10$	$6.70 × 10 - 11$	$2.37 × 10 - 11$	$9.22 × 10 - 12$	$6.23 × 10 - 12$
	20	$6.43 × 10 - 10$	$7.18 × 10 - 11$	$2.80 × 10 - 11$	$9.35 × 10 - 12$	$6.51 × 10 - 12$
	40	$8.05 × 10 - 10$	$6.82 × 10 - 11$	$2.49 × 10 - 11$	$1.85 × 10 - 11$	$6.94 × 10 - 12$

表4 Normal数据集上恶意数据存在时不同方法的MSE对比

Tab. 4 MSE comparison of different methods on Normal dataset when malicious data exist

方法	恶意数据比例/%	隐私预算
方法	恶意数据比例/%	$ε = 0.2$	$ε = 0.4$	$ε = 0.6$	$ε = 0.8$	$ε = 1.0$
mixDUMP	10	$7.82 × 10 - 3$	$7.63 × 10 - 3$	$7.51 × 10 - 3$	$7.12 × 10 - 3$	$7.04 × 10 - 3$
	20	$2.41 × 10 - 2$	$2.32 × 10 - 2$	$1.83 × 10 - 2$	$1.62 × 10 - 2$	$1.51 × 10 - 2$
	40	$6.48 × 10 - 2$	$6.32 × 10 - 2$	$6.03 × 10 - 2$	$5.91 × 10 - 2$	$5.57 × 10 - 2$
PSDP	10	$5.39 × 10 - 3$	$5.24 × 10 - 3$	$5.21 × 10 - 3$	$5.16 × 10 - 3$	$4.88 × 10 - 3$
	20	$9.72 × 10 - 3$	$9.50 × 10 - 3$	$9.23 × 10 - 3$	$8.68 × 10 - 3$	$8.47 × 10 - 3$
	40	$3.60 × 10 - 2$	$3.46 × 10 - 2$	$3.37 × 10 - 2$	$3.11 × 10 - 2$	$2.85 × 10 - 2$
HP-SDP	10	$5.53 × 10 - 5$	$5.20 × 10 - 5$	$5.03 × 10 - 5$	$4.75 × 10 - 5$	$4.58 × 10 - 5$
	20	$8.35 × 10 - 5$	$8.14 × 10 - 5$	$7.91 × 10 - 5$	$7.42 × 10 - 5$	$7.13 × 10 - 5$
	40	$5.65 × 10 - 4$	$5.54 × 10 - 4$	$5.38 × 10 - 4$	$4.95 × 10 - 4$	$4.63 × 10 - 4$
OD-HP	10	$4.49 × 10 - 5$	$4.23 × 10 - 5$	$4.10 × 10 - 5$	$3.85 × 10 - 5$	$3.57 × 10 - 5$
	20	$6.81 × 10 - 5$	$6.72 × 10 - 5$	$6.50 × 10 - 5$	$6.21 × 10 - 5$	$6.04 × 10 - 5$
	40	$3.94 × 10 - 4$	$3.62 × 10 - 4$	$3.55 × 10 - 4$	$3.27 × 10 - 4$	$3.10 × 10 - 4$
Laplace	10	$2.32 × 10 - 6$	$2.18 × 10 - 6$	$2.02 × 10 - 6$	$1.86 × 10 - 6$	$1.71 × 10 - 6$
	20	$4.38 × 10 - 6$	$4.26 × 10 - 6$	$4.18 × 10 - 6$	$3.93 × 10 - 6$	$3.81 × 10 - 6$
	40	$1.23 × 10 - 5$	$1.08 × 10 - 5$	$9.95 × 10 - 6$	$9.77 × 10 - 6$	$9.61 × 10 - 6$
SM-SDP	10	$6.54 × 10 - 10$	$6.70 × 10 - 11$	$2.37 × 10 - 11$	$9.22 × 10 - 12$	$6.23 × 10 - 12$
	20	$6.43 × 10 - 10$	$7.18 × 10 - 11$	$2.80 × 10 - 11$	$9.35 × 10 - 12$	$6.51 × 10 - 12$
	40	$8.05 × 10 - 10$	$6.82 × 10 - 11$	$2.49 × 10 - 11$	$1.85 × 10 - 11$	$6.94 × 10 - 12$

表5 Zipf数据集上恶意数据存在时不同方法的MSE对比

Tab. 5 MSE comparison of different methods on Zipf dataset when malicious data exist

方法	恶意数据比例/%	隐私预算
方法	恶意数据比例/%	$ε = 0.2$	$ε = 0.4$	$ε = 0.6$	$ε = 0.8$	$ε = 1.0$
mixDUMP	10	$3.01 × 10 - 2$	$2.81 × 10 - 2$	$2.58 × 10 - 2$	$2.32 × 10 - 2$	$2.00 × 10 - 2$
	20	$6.01 × 10 - 2$	$5.80 × 10 - 2$	$5.54 × 10 - 2$	$5.32 × 10 - 2$	$5.01 × 10 - 2$
	40	$2.02 × 10 - 1$	$1.80 × 10 - 1$	$1.68 × 10 - 1$	$1.53 × 10 - 1$	$1.49 × 10 - 1$
PSDP	10	$1.74 × 10 - 2$	$1.59 × 10 - 2$	$1.44 × 10 - 2$	$1.18 × 10 - 2$	$9.85 × 10 - 3$
	20	$3.63 × 10 - 2$	$3.58 × 10 - 2$	$3.40 × 10 - 2$	$3.27 × 10 - 2$	$3.00 × 10 - 2$
	40	$9.22 × 10 - 2$	$9.16 × 10 - 2$	$8.90 × 10 - 2$	$8.73 × 10 - 2$	$8.66 × 10 - 2$
HP-SDP	10	$7.56 × 10 - 4$	$7.45 × 10 - 4$	$7.24 × 10 - 4$	$6.62 × 10 - 4$	$6.42 × 10 - 4$
	20	$1.45 × 10 - 3$	$1.37 × 10 - 3$	$1.20 × 10 - 3$	$1.04 × 10 - 3$	$9.61 × 10 - 4$
	40	$8.67 × 10 - 3$	$8.53 × 10 - 3$	$8.36 × 10 - 3$	$8.04 × 10 - 3$	$7.75 × 10 - 3$
OD-HP	10	$5.45 × 10 - 4$	$5.42 × 10 - 4$	$5.37 × 10 - 4$	$5.10 × 10 - 4$	$4.84 × 10 - 4$
	20	$7.89 × 10 - 4$	$7.70 × 10 - 4$	$7.51 × 10 - 4$	$7.46 × 10 - 4$	$7.22 × 10 - 4$
	40	$4.56 × 10 - 3$	$4.71 × 10 - 3$	$4.24 × 10 - 3$	$4.08 × 10 - 3$	$3.64 × 10 - 3$
Laplace	10	$3.02 × 10 - 6$	$2.91 × 10 - 6$	$2.66 × 10 - 6$	$2.50 × 10 - 6$	$2.23 × 10 - 6$
	20	$3.87 × 10 - 6$	$3.75 × 10 - 6$	$3.61 × 10 - 6$	$3.47 × 10 - 6$	$3.36 × 10 - 6$
	40	$3.79 × 10 - 5$	$3.60 × 10 - 5$	$3.48 × 10 - 5$	$3.16 × 10 - 5$	$3.01 × 10 - 5$
SM-SDP	10	$4.12 × 10 - 9$	$1.43 × 10 - 9$	$4.18 × 10 - 10$	$1.32 × 10 - 10$	$8.00 × 10 - 11$
	20	$4.77 × 10 - 9$	$2.51 × 10 - 9$	$5.20 × 10 - 10$	$2.31 × 10 - 10$	$7.56 × 10 - 11$
	40	$6.51 × 10 - 9$	$1.60 × 10 - 9$	$4.54 × 10 - 10$	$1.15 × 10 - 10$	$7.25 × 10 - 11$

表5 Zipf数据集上恶意数据存在时不同方法的MSE对比

Tab. 5 MSE comparison of different methods on Zipf dataset when malicious data exist

方法	恶意数据比例/%	隐私预算
方法	恶意数据比例/%	$ε = 0.2$	$ε = 0.4$	$ε = 0.6$	$ε = 0.8$	$ε = 1.0$
mixDUMP	10	$3.01 × 10 - 2$	$2.81 × 10 - 2$	$2.58 × 10 - 2$	$2.32 × 10 - 2$	$2.00 × 10 - 2$
	20	$6.01 × 10 - 2$	$5.80 × 10 - 2$	$5.54 × 10 - 2$	$5.32 × 10 - 2$	$5.01 × 10 - 2$
	40	$2.02 × 10 - 1$	$1.80 × 10 - 1$	$1.68 × 10 - 1$	$1.53 × 10 - 1$	$1.49 × 10 - 1$
PSDP	10	$1.74 × 10 - 2$	$1.59 × 10 - 2$	$1.44 × 10 - 2$	$1.18 × 10 - 2$	$9.85 × 10 - 3$
	20	$3.63 × 10 - 2$	$3.58 × 10 - 2$	$3.40 × 10 - 2$	$3.27 × 10 - 2$	$3.00 × 10 - 2$
	40	$9.22 × 10 - 2$	$9.16 × 10 - 2$	$8.90 × 10 - 2$	$8.73 × 10 - 2$	$8.66 × 10 - 2$
HP-SDP	10	$7.56 × 10 - 4$	$7.45 × 10 - 4$	$7.24 × 10 - 4$	$6.62 × 10 - 4$	$6.42 × 10 - 4$
	20	$1.45 × 10 - 3$	$1.37 × 10 - 3$	$1.20 × 10 - 3$	$1.04 × 10 - 3$	$9.61 × 10 - 4$
	40	$8.67 × 10 - 3$	$8.53 × 10 - 3$	$8.36 × 10 - 3$	$8.04 × 10 - 3$	$7.75 × 10 - 3$
OD-HP	10	$5.45 × 10 - 4$	$5.42 × 10 - 4$	$5.37 × 10 - 4$	$5.10 × 10 - 4$	$4.84 × 10 - 4$
	20	$7.89 × 10 - 4$	$7.70 × 10 - 4$	$7.51 × 10 - 4$	$7.46 × 10 - 4$	$7.22 × 10 - 4$
	40	$4.56 × 10 - 3$	$4.71 × 10 - 3$	$4.24 × 10 - 3$	$4.08 × 10 - 3$	$3.64 × 10 - 3$
Laplace	10	$3.02 × 10 - 6$	$2.91 × 10 - 6$	$2.66 × 10 - 6$	$2.50 × 10 - 6$	$2.23 × 10 - 6$
	20	$3.87 × 10 - 6$	$3.75 × 10 - 6$	$3.61 × 10 - 6$	$3.47 × 10 - 6$	$3.36 × 10 - 6$
	40	$3.79 × 10 - 5$	$3.60 × 10 - 5$	$3.48 × 10 - 5$	$3.16 × 10 - 5$	$3.01 × 10 - 5$
SM-SDP	10	$4.12 × 10 - 9$	$1.43 × 10 - 9$	$4.18 × 10 - 10$	$1.32 × 10 - 10$	$8.00 × 10 - 11$
	20	$4.77 × 10 - 9$	$2.51 × 10 - 9$	$5.20 × 10 - 10$	$2.31 × 10 - 10$	$7.56 × 10 - 11$
	40	$6.51 × 10 - 9$	$1.60 × 10 - 9$	$4.54 × 10 - 10$	$1.15 × 10 - 10$	$7.25 × 10 - 11$

表6 IPUMS数据集上恶意数据存在时不同方法的MSE对比

Tab. 6 MSE comparison of different methods on IPUMS dataset when malicious data exist

方法	恶意数据比例/%	隐私预算
方法	恶意数据比例/%	$ε = 0.2$	$ε = 0.4$	$ε = 0.6$	$ε = 0.8$	$ε = 1.0$
mixDUMP	10	$6.76 × 10 - 2$	$6.51 × 10 - 2$	$6.31 × 10 - 2$	$5.83 × 10 - 2$	$5.62 × 10 - 2$
	20	$8.18 × 10 - 2$	$8.07 × 10 - 2$	$7.74 × 10 - 2$	$7.59 × 10 - 2$	$7.43 × 10 - 2$
	40	$4.66 × 10 - 1$	$4.56 × 10 - 1$	$4.20 × 10 - 1$	$3.83 × 10 - 1$	$3.51 × 10 - 1$
PSDP	10	$6.13 × 10 - 2$	$6.25 × 10 - 2$	$5.74 × 10 - 2$	$5.50 × 10 - 2$	$5.29 × 10 - 2$
	20	$7.32 × 10 - 2$	$7.20 × 10 - 2$	$7.07 × 10 - 2$	$6.81 × 10 - 2$	$6.64 × 10 - 2$
	40	$2.95 × 10 - 1$	$2.73 × 10 - 1$	$2.45 × 10 - 1$	$2.32 × 10 - 1$	$2.16 × 10 - 1$
HP-SDP	10	$3.78 × 10 - 4$	$3.46 × 10 - 4$	$3.37 × 10 - 4$	$3.21 × 10 - 4$	$2.91 × 10 - 4$
	20	$6.22 × 10 - 4$	$5.93 × 10 - 4$	$5.62 × 10 - 4$	$5.43 × 10 - 4$	$5.12 × 10 - 4$
	40	$2.81 × 10 - 3$	$2.65 × 10 - 3$	$2.58 × 10 - 3$	$2.06 × 10 - 3$	$1.71 × 10 - 3$
OD-HP	10	$1.66 × 10 - 4$	$1.41 × 10 - 4$	$1.30 × 10 - 4$	$1.19 × 10 - 4$	$1.05 × 10 - 4$
	20	$5.17 × 10 - 4$	$5.36 × 10 - 4$	$5.05 × 10 - 4$	$4.77 × 10 - 4$	$4.63 × 10 - 4$
	40	$9.72 × 10 - 4$	$9.50 × 10 - 4$	$9.46 × 10 - 4$	$9.20 × 10 - 4$	$9.03 × 10 - 4$
Laplace	10	$2.49 × 10 - 5$	$2.32 × 10 - 5$	$2.10 × 10 - 5$	$1.98 × 10 - 5$	$1.81 × 10 - 5$
	20	$4.30 × 10 - 5$	$4.18 × 10 - 5$	$3.84 × 10 - 5$	$3.77 × 10 - 5$	$3.63 × 10 - 5$
	40	$1.12 × 10 - 4$	$9.85 × 10 - 5$	$9.51 × 10 - 5$	$9.25 × 10 - 5$	$9.00 × 10 - 5$
SM-SDP	10	$3.41 × 10 - 9$	$6.18 × 10 - 11$	$3.64 × 10 - 11$	$8.72 × 10 - 12$	$5.35 × 10 - 12$
	20	$4.55 × 10 - 9$	$7.10 × 10 - 11$	$3.13 × 10 - 11$	$9.25 × 10 - 12$	$4.97 × 10 - 12$
	40	$4.29 × 10 - 9$	$6.03 × 10 - 11$	$4.14 × 10 - 11$	$8.69 × 10 - 12$	$6.28 × 10 - 12$

表6 IPUMS数据集上恶意数据存在时不同方法的MSE对比

Tab. 6 MSE comparison of different methods on IPUMS dataset when malicious data exist

方法	恶意数据比例/%	隐私预算
方法	恶意数据比例/%	$ε = 0.2$	$ε = 0.4$	$ε = 0.6$	$ε = 0.8$	$ε = 1.0$
mixDUMP	10	$6.76 × 10 - 2$	$6.51 × 10 - 2$	$6.31 × 10 - 2$	$5.83 × 10 - 2$	$5.62 × 10 - 2$
	20	$8.18 × 10 - 2$	$8.07 × 10 - 2$	$7.74 × 10 - 2$	$7.59 × 10 - 2$	$7.43 × 10 - 2$
	40	$4.66 × 10 - 1$	$4.56 × 10 - 1$	$4.20 × 10 - 1$	$3.83 × 10 - 1$	$3.51 × 10 - 1$
PSDP	10	$6.13 × 10 - 2$	$6.25 × 10 - 2$	$5.74 × 10 - 2$	$5.50 × 10 - 2$	$5.29 × 10 - 2$
	20	$7.32 × 10 - 2$	$7.20 × 10 - 2$	$7.07 × 10 - 2$	$6.81 × 10 - 2$	$6.64 × 10 - 2$
	40	$2.95 × 10 - 1$	$2.73 × 10 - 1$	$2.45 × 10 - 1$	$2.32 × 10 - 1$	$2.16 × 10 - 1$
HP-SDP	10	$3.78 × 10 - 4$	$3.46 × 10 - 4$	$3.37 × 10 - 4$	$3.21 × 10 - 4$	$2.91 × 10 - 4$
	20	$6.22 × 10 - 4$	$5.93 × 10 - 4$	$5.62 × 10 - 4$	$5.43 × 10 - 4$	$5.12 × 10 - 4$
	40	$2.81 × 10 - 3$	$2.65 × 10 - 3$	$2.58 × 10 - 3$	$2.06 × 10 - 3$	$1.71 × 10 - 3$
OD-HP	10	$1.66 × 10 - 4$	$1.41 × 10 - 4$	$1.30 × 10 - 4$	$1.19 × 10 - 4$	$1.05 × 10 - 4$
	20	$5.17 × 10 - 4$	$5.36 × 10 - 4$	$5.05 × 10 - 4$	$4.77 × 10 - 4$	$4.63 × 10 - 4$
	40	$9.72 × 10 - 4$	$9.50 × 10 - 4$	$9.46 × 10 - 4$	$9.20 × 10 - 4$	$9.03 × 10 - 4$
Laplace	10	$2.49 × 10 - 5$	$2.32 × 10 - 5$	$2.10 × 10 - 5$	$1.98 × 10 - 5$	$1.81 × 10 - 5$
	20	$4.30 × 10 - 5$	$4.18 × 10 - 5$	$3.84 × 10 - 5$	$3.77 × 10 - 5$	$3.63 × 10 - 5$
	40	$1.12 × 10 - 4$	$9.85 × 10 - 5$	$9.51 × 10 - 5$	$9.25 × 10 - 5$	$9.00 × 10 - 5$
SM-SDP	10	$3.41 × 10 - 9$	$6.18 × 10 - 11$	$3.64 × 10 - 11$	$8.72 × 10 - 12$	$5.35 × 10 - 12$
	20	$4.55 × 10 - 9$	$7.10 × 10 - 11$	$3.13 × 10 - 11$	$9.25 × 10 - 12$	$4.97 × 10 - 12$
	40	$4.29 × 10 - 9$	$6.03 × 10 - 11$	$4.14 × 10 - 11$	$8.69 × 10 - 12$	$6.28 × 10 - 12$

表7 Kosarak数据集上恶意数据存在时不同方法的MSE对比

Tab. 7 MSE comparison of different methods on Kosarak dataset when malicious data exist

方法	恶意数据比例/%	隐私预算
方法	恶意数据比例/%	$ε = 0.2$	$ε = 0.4$	$ε = 0.6$	$ε = 0.8$	$ε = 1.0$
mixDUMP	10	$5.17 × 10 - 3$	$5.02 × 10 - 3$	$4.65 × 10 - 3$	$4.40 × 10 - 3$	$4.36 × 10 - 3$
	20	$7.54 × 10 - 3$	$7.31 × 10 - 3$	$7.03 × 10 - 3$	$6.73 × 10 - 3$	$6.82 × 10 - 3$
	40	$3.51 × 10 - 2$	$3.37 × 10 - 2$	$3.09 × 10 - 2$	$2.88 × 10 - 2$	$2.62 × 10 - 2$
PSDP	10	$2.46 × 10 - 3$	$2.21 × 10 - 3$	$2.17 × 10 - 3$	$1.83 × 10 - 3$	$1.68 × 10 - 3$
	20	$6.79 × 10 - 3$	$6.93 × 10 - 3$	$6.42 × 10 - 3$	$6.26 × 10 - 3$	$6.01 × 10 - 3$
	40	$1.83 × 10 - 2$	$1.65 × 10 - 2$	$1.57 × 10 - 2$	$1.20 × 10 - 2$	$1.14 × 10 - 2$
HP-SDP	10	$2.83 × 10 - 5$	$2.65 × 10 - 5$	$2.36 × 10 - 5$	$2.04 × 10 - 5$	$1.72 × 10 - 5$
	20	$5.29 × 10 - 5$	$5.07 × 10 - 5$	$4.86 × 10 - 5$	$4.70 × 10 - 5$	$4.51 × 10 - 5$
	40	$2.72 × 10 - 4$	$2.54 × 10 - 4$	$2.13 × 10 - 4$	$1.83 × 10 - 4$	$1.66 × 10 - 4$
OD-HP	10	$9.66 × 10 - 6$	$9.50 × 10 - 6$	$9.24 × 10 - 6$	$8.81 × 10 - 6$	$8.75 × 10 - 6$
	20	$3.51 × 10 - 5$	$3.32 × 10 - 5$	$3.08 × 10 - 5$	$2.85 × 10 - 5$	$2.74 × 10 - 5$
	40	$9.83 × 10 - 5$	$9.57 × 10 - 5$	$9.40 × 10 - 5$	$9.06 × 10 - 5$	$8.91 × 10 - 5$
Laplace	10	$4.77 × 10 - 6$	$4.42 × 10 - 6$	$4.31 × 10 - 6$	$4.15 × 10 - 6$	$3.92 × 10 - 6$
	20	$7.12 × 10 - 6$	$6.83 × 10 - 6$	$6.76 × 10 - 6$	$6.61 × 10 - 6$	$6.58 × 10 - 6$
	40	$2.63 × 10 - 5$	$2.49 × 10 - 5$	$2.26 × 10 - 5$	$2.17 × 10 - 5$	$1.91 × 10 - 5$
SM-SDP	10	$5.46 × 10 - 8$	$1.42 × 10 - 9$	$6.83 × 10 - 10$	$5.30 × 10 - 10$	$2.12 × 10 - 10$
	20	$6.72 × 10 - 8$	$1.58 × 10 - 9$	$5.04 × 10 - 10$	$4.56 × 10 - 10$	$2.57 × 10 - 10$
	40	$5.91 × 10 - 8$	$1.67 × 10 - 9$	$6.48 × 10 - 10$	$3.70 × 10 - 10$	$1.88 × 10 - 10$

表7 Kosarak数据集上恶意数据存在时不同方法的MSE对比

Tab. 7 MSE comparison of different methods on Kosarak dataset when malicious data exist

方法	恶意数据比例/%	隐私预算
方法	恶意数据比例/%	$ε = 0.2$	$ε = 0.4$	$ε = 0.6$	$ε = 0.8$	$ε = 1.0$
mixDUMP	10	$5.17 × 10 - 3$	$5.02 × 10 - 3$	$4.65 × 10 - 3$	$4.40 × 10 - 3$	$4.36 × 10 - 3$
	20	$7.54 × 10 - 3$	$7.31 × 10 - 3$	$7.03 × 10 - 3$	$6.73 × 10 - 3$	$6.82 × 10 - 3$
	40	$3.51 × 10 - 2$	$3.37 × 10 - 2$	$3.09 × 10 - 2$	$2.88 × 10 - 2$	$2.62 × 10 - 2$
PSDP	10	$2.46 × 10 - 3$	$2.21 × 10 - 3$	$2.17 × 10 - 3$	$1.83 × 10 - 3$	$1.68 × 10 - 3$
	20	$6.79 × 10 - 3$	$6.93 × 10 - 3$	$6.42 × 10 - 3$	$6.26 × 10 - 3$	$6.01 × 10 - 3$
	40	$1.83 × 10 - 2$	$1.65 × 10 - 2$	$1.57 × 10 - 2$	$1.20 × 10 - 2$	$1.14 × 10 - 2$
HP-SDP	10	$2.83 × 10 - 5$	$2.65 × 10 - 5$	$2.36 × 10 - 5$	$2.04 × 10 - 5$	$1.72 × 10 - 5$
	20	$5.29 × 10 - 5$	$5.07 × 10 - 5$	$4.86 × 10 - 5$	$4.70 × 10 - 5$	$4.51 × 10 - 5$
	40	$2.72 × 10 - 4$	$2.54 × 10 - 4$	$2.13 × 10 - 4$	$1.83 × 10 - 4$	$1.66 × 10 - 4$
OD-HP	10	$9.66 × 10 - 6$	$9.50 × 10 - 6$	$9.24 × 10 - 6$	$8.81 × 10 - 6$	$8.75 × 10 - 6$
	20	$3.51 × 10 - 5$	$3.32 × 10 - 5$	$3.08 × 10 - 5$	$2.85 × 10 - 5$	$2.74 × 10 - 5$
	40	$9.83 × 10 - 5$	$9.57 × 10 - 5$	$9.40 × 10 - 5$	$9.06 × 10 - 5$	$8.91 × 10 - 5$
Laplace	10	$4.77 × 10 - 6$	$4.42 × 10 - 6$	$4.31 × 10 - 6$	$4.15 × 10 - 6$	$3.92 × 10 - 6$
	20	$7.12 × 10 - 6$	$6.83 × 10 - 6$	$6.76 × 10 - 6$	$6.61 × 10 - 6$	$6.58 × 10 - 6$
	40	$2.63 × 10 - 5$	$2.49 × 10 - 5$	$2.26 × 10 - 5$	$2.17 × 10 - 5$	$1.91 × 10 - 5$
SM-SDP	10	$5.46 × 10 - 8$	$1.42 × 10 - 9$	$6.83 × 10 - 10$	$5.30 × 10 - 10$	$2.12 × 10 - 10$
	20	$6.72 × 10 - 8$	$1.58 × 10 - 9$	$5.04 × 10 - 10$	$4.56 × 10 - 10$	$2.57 × 10 - 10$
	40	$5.91 × 10 - 8$	$1.67 × 10 - 9$	$6.48 × 10 - 10$	$3.70 × 10 - 10$	$1.88 × 10 - 10$

参考文献 34

[1]	徐雅鑫.面向数据收集与分析的混洗差分隐私方法研究［D］.郑州：河南财经政法大学，2022.
	XU Y X. Research on shuffled differential privacy method for data collection and analysis ［D］. Zhengzhou： Henan University of Economics and Law， 2022.
[2]	刘艺菲，王宁，王志刚，等.混洗差分隐私下的多维类别数据的收集与分析［J］.软件学报，2022， 33（3）： 1093-1110.
	LIU Y F， WANG N， WANG Z G， et al. Collecting and analyzing multidimensional categorical data under shuffled differential privacy ［J］. Journal of Software， 2022， 33（3）： 1093-1110.
[3]	PAUL S， MISHRA S. ARA： aggregated RAPPOR and analysis for centralized differential privacy ［J］. SN Computer Science， 2020， 1： No.22.
[4]	张可铧.基于差分隐私保护的数据发布与挖掘方法［D］.南京：南京邮电大学，2021.
	ZHANG K H. Research on data publishing and mining method based on differential privacy ［D］. Nanjing： Nanjing University of Posts and Telecommunications， 2021.
[5]	王超迁.基于差分隐私的直方图发布技术研究［D］.南京：东南大学，2021.
	WANG C Q. Research on histogram publishing technology based on differential privacy ［D］. Nanjing： Southeast University， 2021.
[6]	WANG T， ZHANG X， FENG J， et al. A comprehensive survey on local differential privacy toward data statistics and analysis ［J］. Sensors， 2020， 20（24）： No.7030.
[7]	VARMA G. Local hashing and fake data for privacy-aware frequency estimation ［C］// Proceedings of the 17th International Conference on Ubiquitous Information Management and Communication. Piscataway： IEEE， 2023： 1-4.
[8]	YANG M， GUO T， ZHU T， et al. Local differential privacy and its applications： a comprehensive survey ［J］. Computer Standards and Interfaces， 2023： No.103827.
[9]	BITTAU A， ERLINGSSON Ú， MANIATIS P， et al. Prochlo： strong privacy for analytics in the crowd ［C］// Proceedings of the 26th Symposium on Operating Systems Principles. New York： ACM， 2017： 441-459.
[10]	ERLINGSSON Ú， FELDMAN V， MIRONOV I， et al. Amplification by shuffling： from local to central differential privacy via anonymity ［C］// Proceedings of the 30th Annual ACM-SIAM Symposium on Discrete Algorithms. Philadelphia： SIAM， 2019： 2468-2479.
[11]	CHEU A， SMITH A， ULLMAN J， et al. Distributed differential privacy via shuffling ［C］// Proceedings of the 2019 Annual International Conference on the Theory and Applications of Cryptographic Techniques， LNCS 11476. Cham： Springer， 2019： 375-403.
[12]	LUO Q， WANG Y， YI K. Frequency estimation in the shuffle model with almost a single message ［C］// Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security. New York： ACM， 2022： 2219-2232.
[13]	GHAZI B， GOLOWICH N， KUMAR R， et al. On the power of multiple anonymous messages： frequency estimation and selection in the shuffle model of differential privacy ［C］// Proceedings of the 2021 Annual International Conference on the Theory and Applications of Cryptographic Techniques， LNCS 12698. Cham： Springer， 2021： 463-488.
[14]	SCOTT M， CORMODE G， MAPLE C. Applying the shuffle model of differential privacy to vector aggregation ［C］// Proceedings of the 2021 British International Conference on Databases. Aachen： CEUR-WS.org， 2022： 50-59.
[15]	BALLE B， BELL J， GASCÓN A， et al. The privacy blanket of the shuffle model ［C］// Proceedings of the 2019 Annual International Cryptology Conference， LNCS 11693. Berlin： Springer， 2019： 638-667.
[16]	LI X， LIU W， CHEN Z， et al. DUMP： a dummy-point-based framework for histogram estimation in shuffle model ［EB/OL］. ［2024-06-22］. .
[17]	BALCER V， CHEU A. Separating local & shuffled differential privacy via histograms ［EB/OL］. ［2024-06-22］. .
[18]	WANG T， XU M， DING B， et al. MURS： practical and robust privacy amplification with multi-party differential privacy ［C］// Proceedings of the 2020 Annual Network and Distributed Systems Security Symposium — Posters. Reston， VA： Internet Society， 2020： 1-3.
[19]	CHEN W N， SONG D， ÖZGÜR A， et al. Privacy amplification via compression： achieving the optimal privacy-accuracy-communication trade-off in distributed mean estimation ［C］// Proceedings of the 37th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2023： 69202-69227.
[20]	ERLINGSSON Ú， FELDMAN V， MIRONOV I， et al. Encode， shuffle， analyze privacy revisited： formalizations and empirical evaluation ［EB/OL］. ［2024-06-20］. .
[21]	曹来成，陈丽.基于OLH和虚拟数据的SDP直方图发布算法［J］.计算机应用研究，2024， 41（12）： 3829-3833.
	CAO L C， CHEN L. SDP histogram publishing algorithm based on OLH and dummy points ［J］. Application Research of Computers， 2024， 41（12）： 3829-3833.
[22]	WANG T， DING B， XU M， et al. Improving utility and security of the shuffler-based differential privacy ［J］. Proceedings of the VLDB Endowment， 2020， 13（13）： 3545-3558.
[23]	SENGUPTA P， PAUL S， MISHRA S. BUDS： balancing utility and differential privacy by shuffling ［C］// Proceedings of the 11th International Conference on Computing， Communication and Networking Technologies. Piscataway： IEEE， 2020： 1-7.
[24]	LIEW S P， TAKAHASHI T， TAKAGI S， et al. Network shuffling： privacy amplification via random walks ［C］// Proceedings of the 2022 International Conference on Management of Data. New York： ACM， 2022： 773-787.
[25]	张啸剑，徐雅鑫，夏庆荣.基于混洗差分隐私的直方图发布方法［J］.软件学报，2022， 33（6）： 2348-2363.
	ZHANG X J， XU Y X， XIA Q R. Histogram publication under shuffled differential privacy ［J］. Journal of Software， 2022， 33（6）： 2348-2363.
[26]	SASY S， JOHNSON A， GOLDBERG I. Fast fully oblivious compaction and shuffling ［C］// Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security. New York： ACM， 2022： 2565-2579.
[27]	GHAZI B， KAMATH P， KUMAR R， et al. Anonymized histograms in intermediate privacy models ［C］// Proceedings of the 36th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2022： 8456-8468.
[28]	BISWAS S， JUNG K， PALAMIDESSI C. Tight differential privacy guarantees for the shuffle model with k-randomized response ［C］// Proceedings of the 2023 International Symposium on Foundations and Practice of Security， LNCS 14551. Cham： Springer， 2024： 440-458.
[29]	YANG R， YANG H， FAN J， et al. Personalized differential privacy in the shuffle model ［C］// Proceedings of the 2023 International Conference on Artificial Intelligence Security and Privacy， LNCS 14509. Singapore： Springer， 2024： 468-482.
[30]	WANG T， BLOCKI J， LI N， et al. Locally differentially private protocols for frequency estimation ［C］// Proceedings of the 26th USENIX Security Symposium. Berkeley： USENIX Association， 2017： 729-745.
[31]	万丽，李方伟，闫少军.基于改进椭圆曲线数字签名的盲签名［J］.计算机应用研究，2011， 28（3）： 1152-1154.
	WAN L， LI F W， YAN S J. Blind signature scheme based on improved elliptic curve digital signature algorithm ［J］. Application Research of Computers， 2011， 28（3）： 1152-1154.
[32]	FLOOD S， KING M， RODGERS R， et al. Integrated public use microdata series： version 9.0 ［DS/OL］. ［2024-06-22］. .
[33]	BODON B. Frequent itemset mining dataset repository — Kosarak dataset ［DS/OL］. ［2024-06-22］. .
[34]	DWORK C. Differential privacy ［C］// Proceedings of the 2006 International Colloquium on Automata， Languages， and Programming， LNCS 4052. Berlin： Springer， 2006： 1-12.

安全高效的混洗差分隐私频率估计方法

Secure and efficient frequency estimation method based on shuffled differential privacy

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 14

参考文献 34

相关文章 15

编辑推荐

Metrics

[1]	苏锦涛, 葛丽娜, 肖礼广, 邹经, 王哲. 联邦学习中针对后门攻击的检测与防御方案[J]. 《计算机应用》唯一官方网站, 2025, 45(8): 2399-2408.
[2]	葛丽娜, 王明禹, 田蕾. 联邦学习的高效性研究综述[J]. 《计算机应用》唯一官方网站, 2025, 45(8): 2387-2398.
[3]	高改梅, 杜苗莲, 刘春霞, 杨玉丽, 党伟超, 邸国霞. 基于SM2可链接环签名的联盟链隐私保护方法[J]. 《计算机应用》唯一官方网站, 2025, 45(5): 1564-1572.
[4]	李强, 白少雄, 熊源, 袁薇. 基于视觉大模型隐私保护的监控图像定位[J]. 《计算机应用》唯一官方网站, 2025, 45(3): 832-839.
[5]	王宝银, 薛红梅, 刘期烈, 郭涛. 基于隐私保护的随机共识资产跨链方案[J]. 《计算机应用》唯一官方网站, 2025, 45(2): 497-505.
[6]	陈学斌, 任志强, 张宏扬. 联邦学习中的安全威胁与防御措施综述[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1663-1672.
[7]	刘沛骞, 王水莲, 申自浩, 王辉. 基于轨迹扰动和路网匹配的位置隐私保护算法[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1546-1554.
[8]	高改梅, 张瑾, 刘春霞, 党伟超, 白尚旺. 基于区块链与CP-ABE策略隐藏的众包测试任务隐私保护方案[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 811-818.
[9]	马海峰, 李玉霞, 薛庆水, 杨家海, 高永福. 用于实现区块链隐私保护的属性基加密方案[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 485-489.
[10]	赵振皓, 张仕斌, 万武南, 张金全, 秦智. 基于信誉值和强盲签名算法的委托权益证明共识算法[J]. 《计算机应用》唯一官方网站, 2024, 44(12): 3717-3722.
[11]	王伊婷, 万武南, 张仕斌, 张金全, 秦智. 基于SM9算法的可链接环签名方案[J]. 《计算机应用》唯一官方网站, 2024, 44(12): 3709-3716.
[12]	梁静, 万武南, 张仕斌, 张金全, 秦智. 面向主从链的慈善系统溯源存储模型[J]. 《计算机应用》唯一官方网站, 2024, 44(12): 3751-3758.
[13]	高瑞, 陈学斌, 张祖篡. 面向部分图更新的动态社交网络隐私发布方法[J]. 《计算机应用》唯一官方网站, 2024, 44(12): 3831-3838.
[14]	贾淼, 姚中原, 祝卫华, 高婷婷, 斯雪明, 邓翔. 零知识证明赋能区块链的进展与展望[J]. 《计算机应用》唯一官方网站, 2024, 44(12): 3669-3677.
[15]	方鹏, 赵凡, 王保全, 王轶, 蒋同海. 区块链3.0的发展、技术与应用[J]. 《计算机应用》唯一官方网站, 2024, 44(12): 3647-3657.