Secure and efficient frequency estimation method based on shuffled differential privacy

doi:10.11772/j.issn.1001-9081.2024070911

Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (8): 2600-2611.DOI: 10.11772/j.issn.1001-9081.2024070911

• Cyber security • Previous Articles

Secure and efficient frequency estimation method based on shuffled differential privacy

Yan YAN, Feifei LI(), Yaqin LYU, Tao FENG

School of Computer and Communication，Lanzhou University of Technology，Lanzhou Gansu 730050，China

Received:2024-06-30 Revised:2024-10-14 Accepted:2024-10-16 Online:2024-11-19 Published:2025-08-10
Contact: Feifei LI
About author:YAN Yan， born in 1980， Ph. D.， professor. Her research interests include privacy protection， information security.
LYU Yaqin， born in 2000， M. S. candidate. Her research interests include location privacy protection.
FENG Tao， born in 1970， Ph. D.， research fellow. His research interests include modern cryptography theory， information security.
Supported by:
National Natural Science Foundation of China(62361036);Natural Science Foundation of Gansu Province(22JR5RA279)

安全高效的混洗差分隐私频率估计方法

晏燕, 李飞飞(), 吕雅琴, 冯涛

兰州理工大学计算机与通信学院，兰州 730050

通讯作者: 李飞飞
作者简介:晏燕（1980—），女，甘肃兰州人，教授，博士，CCF高级会员，主要研究方向：隐私保护、信息安全
吕雅琴（2000—），女，山西晋中人，硕士研究生，主要研究方向：位置隐私保护
冯涛（1970—），男，甘肃定西人，研究员，博士，主要研究方向：现代密码学理论、信息安全。
基金资助:
国家自然科学基金资助项目(62361036);甘肃省自然科学基金资助项目(22JR5RA279)

Abstract

Abstract:

Shuffled Differential Privacy （SDP） models can balance the degree of privacy protection at user side and the usability of published results at server side. Therefore， they are more suitable for privacy-preserving big data collection and statistical publishing scenarios. Aiming at the problems of low shuffling efficiency and insufficient shuffling process security of the existing SDP frequency estimation methods， the following work was performed： firstly， an SDP Blind Signature Algorithm （SDPBSA） was designed on the basis of optimized elliptic curve to achieve discrimination of tampered or forged information， thereby improving the security of shuffling process. Then， a Matrix Column Rearrangement Transposition （MCRT） shuffling method was proposed to realize data shuffling by random matrix column rearrangement and matrix transposition operations， thereby improving the efficiency of shuffling process. Finally， above methods were combined to construct a complete SDP frequency estimation privacy protection framework — SM-SDP （SDP based on blind Signature and Matrix column rearrangement transposition）， and its privacy and error level were analyzed theoretically. Experimental results on datasets such as Normal， Zipf， and IPUMS （Integrated Public Use Microdata Series） demonstrate that the MCRT shuffling method improves the shuffling efficiency by about 1 to 2 orders of magnitude compared to shuffling methods such as Fisher-Yates， ORShuffle （Oblivious Recursive Shuffling）， and MRS （Message Random Shuffling）； SM-SDP framework reduces the Mean Squared Error （MSE） by 2 to 11 orders of magnitude in the presence of different proportions of malicious data compared to frequency estimation methods such as mixDUMP， PSDP （Personalized Differential Privacy in Shuffle model）， and HP-SDP （Histogram Publication with SDP）.

Key words: privacy protection, frequency estimation, Shuffled Differential Privacy (SDP), blind signature, matrix operation

摘要：

混洗差分隐私（SDP）模型能兼顾用户端的隐私保护程度和服务器端发布结果的可用性，更适用于隐私保护的大数据收集和统计发布场景。针对目前SDP频率估计方法的洗牌效率较低和混洗过程安全性不足等问题，进行以下工作：首先，设计基于优化椭圆曲线的混洗差分隐私盲签名算法（SDPBSA），以实现对篡改或伪造信息的鉴别，提高混洗过程的安全性；其次，提出矩阵列重排转置（MCRT）洗牌方法，以利用随机的矩阵列重排和矩阵转置操作实现数据混洗，提高混洗过程的效率；最后，结合上述方法构建完整的SDP频率估计隐私保护框架——SM-SDP （SDP based on blind Signature and Matrix column rearrangement transposition），并通过理论分析讨论它的隐私性和误差级别。在Normal、Zipf和IPUMS （Integrated Public Use Microdata Series）等数据集上的实验结果表明，相较于Fisher-Yates、ORShuffle （Oblivious Recursive Shuffling）和MRS （Message Random Shuffling）等洗牌方法， MCRT洗牌方法的洗牌效率提升了1~2个数量级；相较于mixDUMP、PSDP （Personalized Differential Privacy in Shuffle model）和HP-SDP （Histogram Publication with SDP）等频率估计方法， SM-SDP框架在不同比例恶意数据存在时的均方误差（MSE）降低了2~11个数量级。

关键词: 隐私保护, 频率估计, 混洗差分隐私, 盲签名, 矩阵运算

CLC Number:

TP309

Yan YAN, Feifei LI, Yaqin LYU, Tao FENG. Secure and efficient frequency estimation method based on shuffled differential privacy[J]. Journal of Computer Applications, 2025, 45(8): 2600-2611.

晏燕, 李飞飞, 吕雅琴, 冯涛. 安全高效的混洗差分隐私频率估计方法[J]. 《计算机应用》唯一官方网站, 2025, 45(8): 2600-2611.

Figures/Tables 14

Fig. 1 Privacy protection framework for frequency estimation based on shuffled differential privacy

Fig. 2 Flow of SDPBSA

Fig. 3 Example diagram of MCRT method

Tab. 1 Experimental dataset information

数据集	用户数	用户数据最大取值
Normal	600 000	600
Zipf	600 000	600
IPUMS	602 325	915
Kosarak	990 002	41 270

Tab. 2 Time complexity of various shuffling methods

洗牌方法	时间复杂度
Fisher-Yates	$O (n 2)$
ORShuffle	$O (n (l o g n) 2)$
MRS	$O (n l o g n)$
MCRT	$O (n)$

Tab. 2 Time complexity of various shuffling methods

洗牌方法	时间复杂度
Fisher-Yates	$O (n 2)$
ORShuffle	$O (n (l o g n) 2)$
MRS	$O (n l o g n)$
MCRT	$O (n)$

Tab. 3 Comparison of running time of various shuffling methods

洗牌方法	Normal	Zipf	IPUMS	Kosarak
Fisher-Yates	55.599 681	56.246 354	65.479 238	539.611 728
ORShuffle	38.012 453	37.446 202	47.264 105	76.035 698
MRS	10.392 418	9.536 152	19.193 321	19.062 021
MCRT	0.445 108	0.466 110	0.516 073	1.508 853

Fig. 4 MSE comparison of different methods on Normal dataset

Fig. 5 MSE comparison of different methods on Zipf dataset

Fig. 6 MSE comparison of different methods on IPUMS dataset

Fig. 7 MSE comparison of different methods on Kosarak dataset

Tab. 4 MSE comparison of different methods on Normal dataset when malicious data exist

方法	恶意数据比例/%	隐私预算
方法	恶意数据比例/%	$ε = 0.2$	$ε = 0.4$	$ε = 0.6$	$ε = 0.8$	$ε = 1.0$
mixDUMP	10	$7.82 × 10 - 3$	$7.63 × 10 - 3$	$7.51 × 10 - 3$	$7.12 × 10 - 3$	$7.04 × 10 - 3$
	20	$2.41 × 10 - 2$	$2.32 × 10 - 2$	$1.83 × 10 - 2$	$1.62 × 10 - 2$	$1.51 × 10 - 2$
	40	$6.48 × 10 - 2$	$6.32 × 10 - 2$	$6.03 × 10 - 2$	$5.91 × 10 - 2$	$5.57 × 10 - 2$
PSDP	10	$5.39 × 10 - 3$	$5.24 × 10 - 3$	$5.21 × 10 - 3$	$5.16 × 10 - 3$	$4.88 × 10 - 3$
	20	$9.72 × 10 - 3$	$9.50 × 10 - 3$	$9.23 × 10 - 3$	$8.68 × 10 - 3$	$8.47 × 10 - 3$
	40	$3.60 × 10 - 2$	$3.46 × 10 - 2$	$3.37 × 10 - 2$	$3.11 × 10 - 2$	$2.85 × 10 - 2$
HP-SDP	10	$5.53 × 10 - 5$	$5.20 × 10 - 5$	$5.03 × 10 - 5$	$4.75 × 10 - 5$	$4.58 × 10 - 5$
	20	$8.35 × 10 - 5$	$8.14 × 10 - 5$	$7.91 × 10 - 5$	$7.42 × 10 - 5$	$7.13 × 10 - 5$
	40	$5.65 × 10 - 4$	$5.54 × 10 - 4$	$5.38 × 10 - 4$	$4.95 × 10 - 4$	$4.63 × 10 - 4$
OD-HP	10	$4.49 × 10 - 5$	$4.23 × 10 - 5$	$4.10 × 10 - 5$	$3.85 × 10 - 5$	$3.57 × 10 - 5$
	20	$6.81 × 10 - 5$	$6.72 × 10 - 5$	$6.50 × 10 - 5$	$6.21 × 10 - 5$	$6.04 × 10 - 5$
	40	$3.94 × 10 - 4$	$3.62 × 10 - 4$	$3.55 × 10 - 4$	$3.27 × 10 - 4$	$3.10 × 10 - 4$
Laplace	10	$2.32 × 10 - 6$	$2.18 × 10 - 6$	$2.02 × 10 - 6$	$1.86 × 10 - 6$	$1.71 × 10 - 6$
	20	$4.38 × 10 - 6$	$4.26 × 10 - 6$	$4.18 × 10 - 6$	$3.93 × 10 - 6$	$3.81 × 10 - 6$
	40	$1.23 × 10 - 5$	$1.08 × 10 - 5$	$9.95 × 10 - 6$	$9.77 × 10 - 6$	$9.61 × 10 - 6$
SM-SDP	10	$6.54 × 10 - 10$	$6.70 × 10 - 11$	$2.37 × 10 - 11$	$9.22 × 10 - 12$	$6.23 × 10 - 12$
	20	$6.43 × 10 - 10$	$7.18 × 10 - 11$	$2.80 × 10 - 11$	$9.35 × 10 - 12$	$6.51 × 10 - 12$
	40	$8.05 × 10 - 10$	$6.82 × 10 - 11$	$2.49 × 10 - 11$	$1.85 × 10 - 11$	$6.94 × 10 - 12$

Tab. 4 MSE comparison of different methods on Normal dataset when malicious data exist

方法	恶意数据比例/%	隐私预算
方法	恶意数据比例/%	$ε = 0.2$	$ε = 0.4$	$ε = 0.6$	$ε = 0.8$	$ε = 1.0$
mixDUMP	10	$7.82 × 10 - 3$	$7.63 × 10 - 3$	$7.51 × 10 - 3$	$7.12 × 10 - 3$	$7.04 × 10 - 3$
	20	$2.41 × 10 - 2$	$2.32 × 10 - 2$	$1.83 × 10 - 2$	$1.62 × 10 - 2$	$1.51 × 10 - 2$
	40	$6.48 × 10 - 2$	$6.32 × 10 - 2$	$6.03 × 10 - 2$	$5.91 × 10 - 2$	$5.57 × 10 - 2$
PSDP	10	$5.39 × 10 - 3$	$5.24 × 10 - 3$	$5.21 × 10 - 3$	$5.16 × 10 - 3$	$4.88 × 10 - 3$
	20	$9.72 × 10 - 3$	$9.50 × 10 - 3$	$9.23 × 10 - 3$	$8.68 × 10 - 3$	$8.47 × 10 - 3$
	40	$3.60 × 10 - 2$	$3.46 × 10 - 2$	$3.37 × 10 - 2$	$3.11 × 10 - 2$	$2.85 × 10 - 2$
HP-SDP	10	$5.53 × 10 - 5$	$5.20 × 10 - 5$	$5.03 × 10 - 5$	$4.75 × 10 - 5$	$4.58 × 10 - 5$
	20	$8.35 × 10 - 5$	$8.14 × 10 - 5$	$7.91 × 10 - 5$	$7.42 × 10 - 5$	$7.13 × 10 - 5$
	40	$5.65 × 10 - 4$	$5.54 × 10 - 4$	$5.38 × 10 - 4$	$4.95 × 10 - 4$	$4.63 × 10 - 4$
OD-HP	10	$4.49 × 10 - 5$	$4.23 × 10 - 5$	$4.10 × 10 - 5$	$3.85 × 10 - 5$	$3.57 × 10 - 5$
	20	$6.81 × 10 - 5$	$6.72 × 10 - 5$	$6.50 × 10 - 5$	$6.21 × 10 - 5$	$6.04 × 10 - 5$
	40	$3.94 × 10 - 4$	$3.62 × 10 - 4$	$3.55 × 10 - 4$	$3.27 × 10 - 4$	$3.10 × 10 - 4$
Laplace	10	$2.32 × 10 - 6$	$2.18 × 10 - 6$	$2.02 × 10 - 6$	$1.86 × 10 - 6$	$1.71 × 10 - 6$
	20	$4.38 × 10 - 6$	$4.26 × 10 - 6$	$4.18 × 10 - 6$	$3.93 × 10 - 6$	$3.81 × 10 - 6$
	40	$1.23 × 10 - 5$	$1.08 × 10 - 5$	$9.95 × 10 - 6$	$9.77 × 10 - 6$	$9.61 × 10 - 6$
SM-SDP	10	$6.54 × 10 - 10$	$6.70 × 10 - 11$	$2.37 × 10 - 11$	$9.22 × 10 - 12$	$6.23 × 10 - 12$
	20	$6.43 × 10 - 10$	$7.18 × 10 - 11$	$2.80 × 10 - 11$	$9.35 × 10 - 12$	$6.51 × 10 - 12$
	40	$8.05 × 10 - 10$	$6.82 × 10 - 11$	$2.49 × 10 - 11$	$1.85 × 10 - 11$	$6.94 × 10 - 12$

Tab. 5 MSE comparison of different methods on Zipf dataset when malicious data exist

方法	恶意数据比例/%	隐私预算
方法	恶意数据比例/%	$ε = 0.2$	$ε = 0.4$	$ε = 0.6$	$ε = 0.8$	$ε = 1.0$
mixDUMP	10	$3.01 × 10 - 2$	$2.81 × 10 - 2$	$2.58 × 10 - 2$	$2.32 × 10 - 2$	$2.00 × 10 - 2$
	20	$6.01 × 10 - 2$	$5.80 × 10 - 2$	$5.54 × 10 - 2$	$5.32 × 10 - 2$	$5.01 × 10 - 2$
	40	$2.02 × 10 - 1$	$1.80 × 10 - 1$	$1.68 × 10 - 1$	$1.53 × 10 - 1$	$1.49 × 10 - 1$
PSDP	10	$1.74 × 10 - 2$	$1.59 × 10 - 2$	$1.44 × 10 - 2$	$1.18 × 10 - 2$	$9.85 × 10 - 3$
	20	$3.63 × 10 - 2$	$3.58 × 10 - 2$	$3.40 × 10 - 2$	$3.27 × 10 - 2$	$3.00 × 10 - 2$
	40	$9.22 × 10 - 2$	$9.16 × 10 - 2$	$8.90 × 10 - 2$	$8.73 × 10 - 2$	$8.66 × 10 - 2$
HP-SDP	10	$7.56 × 10 - 4$	$7.45 × 10 - 4$	$7.24 × 10 - 4$	$6.62 × 10 - 4$	$6.42 × 10 - 4$
	20	$1.45 × 10 - 3$	$1.37 × 10 - 3$	$1.20 × 10 - 3$	$1.04 × 10 - 3$	$9.61 × 10 - 4$
	40	$8.67 × 10 - 3$	$8.53 × 10 - 3$	$8.36 × 10 - 3$	$8.04 × 10 - 3$	$7.75 × 10 - 3$
OD-HP	10	$5.45 × 10 - 4$	$5.42 × 10 - 4$	$5.37 × 10 - 4$	$5.10 × 10 - 4$	$4.84 × 10 - 4$
	20	$7.89 × 10 - 4$	$7.70 × 10 - 4$	$7.51 × 10 - 4$	$7.46 × 10 - 4$	$7.22 × 10 - 4$
	40	$4.56 × 10 - 3$	$4.71 × 10 - 3$	$4.24 × 10 - 3$	$4.08 × 10 - 3$	$3.64 × 10 - 3$
Laplace	10	$3.02 × 10 - 6$	$2.91 × 10 - 6$	$2.66 × 10 - 6$	$2.50 × 10 - 6$	$2.23 × 10 - 6$
	20	$3.87 × 10 - 6$	$3.75 × 10 - 6$	$3.61 × 10 - 6$	$3.47 × 10 - 6$	$3.36 × 10 - 6$
	40	$3.79 × 10 - 5$	$3.60 × 10 - 5$	$3.48 × 10 - 5$	$3.16 × 10 - 5$	$3.01 × 10 - 5$
SM-SDP	10	$4.12 × 10 - 9$	$1.43 × 10 - 9$	$4.18 × 10 - 10$	$1.32 × 10 - 10$	$8.00 × 10 - 11$
	20	$4.77 × 10 - 9$	$2.51 × 10 - 9$	$5.20 × 10 - 10$	$2.31 × 10 - 10$	$7.56 × 10 - 11$
	40	$6.51 × 10 - 9$	$1.60 × 10 - 9$	$4.54 × 10 - 10$	$1.15 × 10 - 10$	$7.25 × 10 - 11$

Tab. 5 MSE comparison of different methods on Zipf dataset when malicious data exist

方法	恶意数据比例/%	隐私预算
方法	恶意数据比例/%	$ε = 0.2$	$ε = 0.4$	$ε = 0.6$	$ε = 0.8$	$ε = 1.0$
mixDUMP	10	$3.01 × 10 - 2$	$2.81 × 10 - 2$	$2.58 × 10 - 2$	$2.32 × 10 - 2$	$2.00 × 10 - 2$
	20	$6.01 × 10 - 2$	$5.80 × 10 - 2$	$5.54 × 10 - 2$	$5.32 × 10 - 2$	$5.01 × 10 - 2$
	40	$2.02 × 10 - 1$	$1.80 × 10 - 1$	$1.68 × 10 - 1$	$1.53 × 10 - 1$	$1.49 × 10 - 1$
PSDP	10	$1.74 × 10 - 2$	$1.59 × 10 - 2$	$1.44 × 10 - 2$	$1.18 × 10 - 2$	$9.85 × 10 - 3$
	20	$3.63 × 10 - 2$	$3.58 × 10 - 2$	$3.40 × 10 - 2$	$3.27 × 10 - 2$	$3.00 × 10 - 2$
	40	$9.22 × 10 - 2$	$9.16 × 10 - 2$	$8.90 × 10 - 2$	$8.73 × 10 - 2$	$8.66 × 10 - 2$
HP-SDP	10	$7.56 × 10 - 4$	$7.45 × 10 - 4$	$7.24 × 10 - 4$	$6.62 × 10 - 4$	$6.42 × 10 - 4$
	20	$1.45 × 10 - 3$	$1.37 × 10 - 3$	$1.20 × 10 - 3$	$1.04 × 10 - 3$	$9.61 × 10 - 4$
	40	$8.67 × 10 - 3$	$8.53 × 10 - 3$	$8.36 × 10 - 3$	$8.04 × 10 - 3$	$7.75 × 10 - 3$
OD-HP	10	$5.45 × 10 - 4$	$5.42 × 10 - 4$	$5.37 × 10 - 4$	$5.10 × 10 - 4$	$4.84 × 10 - 4$
	20	$7.89 × 10 - 4$	$7.70 × 10 - 4$	$7.51 × 10 - 4$	$7.46 × 10 - 4$	$7.22 × 10 - 4$
	40	$4.56 × 10 - 3$	$4.71 × 10 - 3$	$4.24 × 10 - 3$	$4.08 × 10 - 3$	$3.64 × 10 - 3$
Laplace	10	$3.02 × 10 - 6$	$2.91 × 10 - 6$	$2.66 × 10 - 6$	$2.50 × 10 - 6$	$2.23 × 10 - 6$
	20	$3.87 × 10 - 6$	$3.75 × 10 - 6$	$3.61 × 10 - 6$	$3.47 × 10 - 6$	$3.36 × 10 - 6$
	40	$3.79 × 10 - 5$	$3.60 × 10 - 5$	$3.48 × 10 - 5$	$3.16 × 10 - 5$	$3.01 × 10 - 5$
SM-SDP	10	$4.12 × 10 - 9$	$1.43 × 10 - 9$	$4.18 × 10 - 10$	$1.32 × 10 - 10$	$8.00 × 10 - 11$
	20	$4.77 × 10 - 9$	$2.51 × 10 - 9$	$5.20 × 10 - 10$	$2.31 × 10 - 10$	$7.56 × 10 - 11$
	40	$6.51 × 10 - 9$	$1.60 × 10 - 9$	$4.54 × 10 - 10$	$1.15 × 10 - 10$	$7.25 × 10 - 11$

Tab. 6 MSE comparison of different methods on IPUMS dataset when malicious data exist

方法	恶意数据比例/%	隐私预算
方法	恶意数据比例/%	$ε = 0.2$	$ε = 0.4$	$ε = 0.6$	$ε = 0.8$	$ε = 1.0$
mixDUMP	10	$6.76 × 10 - 2$	$6.51 × 10 - 2$	$6.31 × 10 - 2$	$5.83 × 10 - 2$	$5.62 × 10 - 2$
	20	$8.18 × 10 - 2$	$8.07 × 10 - 2$	$7.74 × 10 - 2$	$7.59 × 10 - 2$	$7.43 × 10 - 2$
	40	$4.66 × 10 - 1$	$4.56 × 10 - 1$	$4.20 × 10 - 1$	$3.83 × 10 - 1$	$3.51 × 10 - 1$
PSDP	10	$6.13 × 10 - 2$	$6.25 × 10 - 2$	$5.74 × 10 - 2$	$5.50 × 10 - 2$	$5.29 × 10 - 2$
	20	$7.32 × 10 - 2$	$7.20 × 10 - 2$	$7.07 × 10 - 2$	$6.81 × 10 - 2$	$6.64 × 10 - 2$
	40	$2.95 × 10 - 1$	$2.73 × 10 - 1$	$2.45 × 10 - 1$	$2.32 × 10 - 1$	$2.16 × 10 - 1$
HP-SDP	10	$3.78 × 10 - 4$	$3.46 × 10 - 4$	$3.37 × 10 - 4$	$3.21 × 10 - 4$	$2.91 × 10 - 4$
	20	$6.22 × 10 - 4$	$5.93 × 10 - 4$	$5.62 × 10 - 4$	$5.43 × 10 - 4$	$5.12 × 10 - 4$
	40	$2.81 × 10 - 3$	$2.65 × 10 - 3$	$2.58 × 10 - 3$	$2.06 × 10 - 3$	$1.71 × 10 - 3$
OD-HP	10	$1.66 × 10 - 4$	$1.41 × 10 - 4$	$1.30 × 10 - 4$	$1.19 × 10 - 4$	$1.05 × 10 - 4$
	20	$5.17 × 10 - 4$	$5.36 × 10 - 4$	$5.05 × 10 - 4$	$4.77 × 10 - 4$	$4.63 × 10 - 4$
	40	$9.72 × 10 - 4$	$9.50 × 10 - 4$	$9.46 × 10 - 4$	$9.20 × 10 - 4$	$9.03 × 10 - 4$
Laplace	10	$2.49 × 10 - 5$	$2.32 × 10 - 5$	$2.10 × 10 - 5$	$1.98 × 10 - 5$	$1.81 × 10 - 5$
	20	$4.30 × 10 - 5$	$4.18 × 10 - 5$	$3.84 × 10 - 5$	$3.77 × 10 - 5$	$3.63 × 10 - 5$
	40	$1.12 × 10 - 4$	$9.85 × 10 - 5$	$9.51 × 10 - 5$	$9.25 × 10 - 5$	$9.00 × 10 - 5$
SM-SDP	10	$3.41 × 10 - 9$	$6.18 × 10 - 11$	$3.64 × 10 - 11$	$8.72 × 10 - 12$	$5.35 × 10 - 12$
	20	$4.55 × 10 - 9$	$7.10 × 10 - 11$	$3.13 × 10 - 11$	$9.25 × 10 - 12$	$4.97 × 10 - 12$
	40	$4.29 × 10 - 9$	$6.03 × 10 - 11$	$4.14 × 10 - 11$	$8.69 × 10 - 12$	$6.28 × 10 - 12$

Tab. 6 MSE comparison of different methods on IPUMS dataset when malicious data exist

方法	恶意数据比例/%	隐私预算
方法	恶意数据比例/%	$ε = 0.2$	$ε = 0.4$	$ε = 0.6$	$ε = 0.8$	$ε = 1.0$
mixDUMP	10	$6.76 × 10 - 2$	$6.51 × 10 - 2$	$6.31 × 10 - 2$	$5.83 × 10 - 2$	$5.62 × 10 - 2$
	20	$8.18 × 10 - 2$	$8.07 × 10 - 2$	$7.74 × 10 - 2$	$7.59 × 10 - 2$	$7.43 × 10 - 2$
	40	$4.66 × 10 - 1$	$4.56 × 10 - 1$	$4.20 × 10 - 1$	$3.83 × 10 - 1$	$3.51 × 10 - 1$
PSDP	10	$6.13 × 10 - 2$	$6.25 × 10 - 2$	$5.74 × 10 - 2$	$5.50 × 10 - 2$	$5.29 × 10 - 2$
	20	$7.32 × 10 - 2$	$7.20 × 10 - 2$	$7.07 × 10 - 2$	$6.81 × 10 - 2$	$6.64 × 10 - 2$
	40	$2.95 × 10 - 1$	$2.73 × 10 - 1$	$2.45 × 10 - 1$	$2.32 × 10 - 1$	$2.16 × 10 - 1$
HP-SDP	10	$3.78 × 10 - 4$	$3.46 × 10 - 4$	$3.37 × 10 - 4$	$3.21 × 10 - 4$	$2.91 × 10 - 4$
	20	$6.22 × 10 - 4$	$5.93 × 10 - 4$	$5.62 × 10 - 4$	$5.43 × 10 - 4$	$5.12 × 10 - 4$
	40	$2.81 × 10 - 3$	$2.65 × 10 - 3$	$2.58 × 10 - 3$	$2.06 × 10 - 3$	$1.71 × 10 - 3$
OD-HP	10	$1.66 × 10 - 4$	$1.41 × 10 - 4$	$1.30 × 10 - 4$	$1.19 × 10 - 4$	$1.05 × 10 - 4$
	20	$5.17 × 10 - 4$	$5.36 × 10 - 4$	$5.05 × 10 - 4$	$4.77 × 10 - 4$	$4.63 × 10 - 4$
	40	$9.72 × 10 - 4$	$9.50 × 10 - 4$	$9.46 × 10 - 4$	$9.20 × 10 - 4$	$9.03 × 10 - 4$
Laplace	10	$2.49 × 10 - 5$	$2.32 × 10 - 5$	$2.10 × 10 - 5$	$1.98 × 10 - 5$	$1.81 × 10 - 5$
	20	$4.30 × 10 - 5$	$4.18 × 10 - 5$	$3.84 × 10 - 5$	$3.77 × 10 - 5$	$3.63 × 10 - 5$
	40	$1.12 × 10 - 4$	$9.85 × 10 - 5$	$9.51 × 10 - 5$	$9.25 × 10 - 5$	$9.00 × 10 - 5$
SM-SDP	10	$3.41 × 10 - 9$	$6.18 × 10 - 11$	$3.64 × 10 - 11$	$8.72 × 10 - 12$	$5.35 × 10 - 12$
	20	$4.55 × 10 - 9$	$7.10 × 10 - 11$	$3.13 × 10 - 11$	$9.25 × 10 - 12$	$4.97 × 10 - 12$
	40	$4.29 × 10 - 9$	$6.03 × 10 - 11$	$4.14 × 10 - 11$	$8.69 × 10 - 12$	$6.28 × 10 - 12$

Tab. 7 MSE comparison of different methods on Kosarak dataset when malicious data exist

方法	恶意数据比例/%	隐私预算
方法	恶意数据比例/%	$ε = 0.2$	$ε = 0.4$	$ε = 0.6$	$ε = 0.8$	$ε = 1.0$
mixDUMP	10	$5.17 × 10 - 3$	$5.02 × 10 - 3$	$4.65 × 10 - 3$	$4.40 × 10 - 3$	$4.36 × 10 - 3$
	20	$7.54 × 10 - 3$	$7.31 × 10 - 3$	$7.03 × 10 - 3$	$6.73 × 10 - 3$	$6.82 × 10 - 3$
	40	$3.51 × 10 - 2$	$3.37 × 10 - 2$	$3.09 × 10 - 2$	$2.88 × 10 - 2$	$2.62 × 10 - 2$
PSDP	10	$2.46 × 10 - 3$	$2.21 × 10 - 3$	$2.17 × 10 - 3$	$1.83 × 10 - 3$	$1.68 × 10 - 3$
	20	$6.79 × 10 - 3$	$6.93 × 10 - 3$	$6.42 × 10 - 3$	$6.26 × 10 - 3$	$6.01 × 10 - 3$
	40	$1.83 × 10 - 2$	$1.65 × 10 - 2$	$1.57 × 10 - 2$	$1.20 × 10 - 2$	$1.14 × 10 - 2$
HP-SDP	10	$2.83 × 10 - 5$	$2.65 × 10 - 5$	$2.36 × 10 - 5$	$2.04 × 10 - 5$	$1.72 × 10 - 5$
	20	$5.29 × 10 - 5$	$5.07 × 10 - 5$	$4.86 × 10 - 5$	$4.70 × 10 - 5$	$4.51 × 10 - 5$
	40	$2.72 × 10 - 4$	$2.54 × 10 - 4$	$2.13 × 10 - 4$	$1.83 × 10 - 4$	$1.66 × 10 - 4$
OD-HP	10	$9.66 × 10 - 6$	$9.50 × 10 - 6$	$9.24 × 10 - 6$	$8.81 × 10 - 6$	$8.75 × 10 - 6$
	20	$3.51 × 10 - 5$	$3.32 × 10 - 5$	$3.08 × 10 - 5$	$2.85 × 10 - 5$	$2.74 × 10 - 5$
	40	$9.83 × 10 - 5$	$9.57 × 10 - 5$	$9.40 × 10 - 5$	$9.06 × 10 - 5$	$8.91 × 10 - 5$
Laplace	10	$4.77 × 10 - 6$	$4.42 × 10 - 6$	$4.31 × 10 - 6$	$4.15 × 10 - 6$	$3.92 × 10 - 6$
	20	$7.12 × 10 - 6$	$6.83 × 10 - 6$	$6.76 × 10 - 6$	$6.61 × 10 - 6$	$6.58 × 10 - 6$
	40	$2.63 × 10 - 5$	$2.49 × 10 - 5$	$2.26 × 10 - 5$	$2.17 × 10 - 5$	$1.91 × 10 - 5$
SM-SDP	10	$5.46 × 10 - 8$	$1.42 × 10 - 9$	$6.83 × 10 - 10$	$5.30 × 10 - 10$	$2.12 × 10 - 10$
	20	$6.72 × 10 - 8$	$1.58 × 10 - 9$	$5.04 × 10 - 10$	$4.56 × 10 - 10$	$2.57 × 10 - 10$
	40	$5.91 × 10 - 8$	$1.67 × 10 - 9$	$6.48 × 10 - 10$	$3.70 × 10 - 10$	$1.88 × 10 - 10$

Tab. 7 MSE comparison of different methods on Kosarak dataset when malicious data exist

方法	恶意数据比例/%	隐私预算
方法	恶意数据比例/%	$ε = 0.2$	$ε = 0.4$	$ε = 0.6$	$ε = 0.8$	$ε = 1.0$
mixDUMP	10	$5.17 × 10 - 3$	$5.02 × 10 - 3$	$4.65 × 10 - 3$	$4.40 × 10 - 3$	$4.36 × 10 - 3$
	20	$7.54 × 10 - 3$	$7.31 × 10 - 3$	$7.03 × 10 - 3$	$6.73 × 10 - 3$	$6.82 × 10 - 3$
	40	$3.51 × 10 - 2$	$3.37 × 10 - 2$	$3.09 × 10 - 2$	$2.88 × 10 - 2$	$2.62 × 10 - 2$
PSDP	10	$2.46 × 10 - 3$	$2.21 × 10 - 3$	$2.17 × 10 - 3$	$1.83 × 10 - 3$	$1.68 × 10 - 3$
	20	$6.79 × 10 - 3$	$6.93 × 10 - 3$	$6.42 × 10 - 3$	$6.26 × 10 - 3$	$6.01 × 10 - 3$
	40	$1.83 × 10 - 2$	$1.65 × 10 - 2$	$1.57 × 10 - 2$	$1.20 × 10 - 2$	$1.14 × 10 - 2$
HP-SDP	10	$2.83 × 10 - 5$	$2.65 × 10 - 5$	$2.36 × 10 - 5$	$2.04 × 10 - 5$	$1.72 × 10 - 5$
	20	$5.29 × 10 - 5$	$5.07 × 10 - 5$	$4.86 × 10 - 5$	$4.70 × 10 - 5$	$4.51 × 10 - 5$
	40	$2.72 × 10 - 4$	$2.54 × 10 - 4$	$2.13 × 10 - 4$	$1.83 × 10 - 4$	$1.66 × 10 - 4$
OD-HP	10	$9.66 × 10 - 6$	$9.50 × 10 - 6$	$9.24 × 10 - 6$	$8.81 × 10 - 6$	$8.75 × 10 - 6$
	20	$3.51 × 10 - 5$	$3.32 × 10 - 5$	$3.08 × 10 - 5$	$2.85 × 10 - 5$	$2.74 × 10 - 5$
	40	$9.83 × 10 - 5$	$9.57 × 10 - 5$	$9.40 × 10 - 5$	$9.06 × 10 - 5$	$8.91 × 10 - 5$
Laplace	10	$4.77 × 10 - 6$	$4.42 × 10 - 6$	$4.31 × 10 - 6$	$4.15 × 10 - 6$	$3.92 × 10 - 6$
	20	$7.12 × 10 - 6$	$6.83 × 10 - 6$	$6.76 × 10 - 6$	$6.61 × 10 - 6$	$6.58 × 10 - 6$
	40	$2.63 × 10 - 5$	$2.49 × 10 - 5$	$2.26 × 10 - 5$	$2.17 × 10 - 5$	$1.91 × 10 - 5$
SM-SDP	10	$5.46 × 10 - 8$	$1.42 × 10 - 9$	$6.83 × 10 - 10$	$5.30 × 10 - 10$	$2.12 × 10 - 10$
	20	$6.72 × 10 - 8$	$1.58 × 10 - 9$	$5.04 × 10 - 10$	$4.56 × 10 - 10$	$2.57 × 10 - 10$
	40	$5.91 × 10 - 8$	$1.67 × 10 - 9$	$6.48 × 10 - 10$	$3.70 × 10 - 10$	$1.88 × 10 - 10$

References 34

[1]	徐雅鑫.面向数据收集与分析的混洗差分隐私方法研究［D］.郑州：河南财经政法大学，2022.
	XU Y X. Research on shuffled differential privacy method for data collection and analysis ［D］. Zhengzhou： Henan University of Economics and Law， 2022.
[2]	刘艺菲，王宁，王志刚，等.混洗差分隐私下的多维类别数据的收集与分析［J］.软件学报，2022， 33（3）： 1093-1110.
	LIU Y F， WANG N， WANG Z G， et al. Collecting and analyzing multidimensional categorical data under shuffled differential privacy ［J］. Journal of Software， 2022， 33（3）： 1093-1110.
[3]	PAUL S， MISHRA S. ARA： aggregated RAPPOR and analysis for centralized differential privacy ［J］. SN Computer Science， 2020， 1： No.22.
[4]	张可铧.基于差分隐私保护的数据发布与挖掘方法［D］.南京：南京邮电大学，2021.
	ZHANG K H. Research on data publishing and mining method based on differential privacy ［D］. Nanjing： Nanjing University of Posts and Telecommunications， 2021.
[5]	王超迁.基于差分隐私的直方图发布技术研究［D］.南京：东南大学，2021.
	WANG C Q. Research on histogram publishing technology based on differential privacy ［D］. Nanjing： Southeast University， 2021.
[6]	WANG T， ZHANG X， FENG J， et al. A comprehensive survey on local differential privacy toward data statistics and analysis ［J］. Sensors， 2020， 20（24）： No.7030.
[7]	VARMA G. Local hashing and fake data for privacy-aware frequency estimation ［C］// Proceedings of the 17th International Conference on Ubiquitous Information Management and Communication. Piscataway： IEEE， 2023： 1-4.
[8]	YANG M， GUO T， ZHU T， et al. Local differential privacy and its applications： a comprehensive survey ［J］. Computer Standards and Interfaces， 2023： No.103827.
[9]	BITTAU A， ERLINGSSON Ú， MANIATIS P， et al. Prochlo： strong privacy for analytics in the crowd ［C］// Proceedings of the 26th Symposium on Operating Systems Principles. New York： ACM， 2017： 441-459.
[10]	ERLINGSSON Ú， FELDMAN V， MIRONOV I， et al. Amplification by shuffling： from local to central differential privacy via anonymity ［C］// Proceedings of the 30th Annual ACM-SIAM Symposium on Discrete Algorithms. Philadelphia： SIAM， 2019： 2468-2479.
[11]	CHEU A， SMITH A， ULLMAN J， et al. Distributed differential privacy via shuffling ［C］// Proceedings of the 2019 Annual International Conference on the Theory and Applications of Cryptographic Techniques， LNCS 11476. Cham： Springer， 2019： 375-403.
[12]	LUO Q， WANG Y， YI K. Frequency estimation in the shuffle model with almost a single message ［C］// Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security. New York： ACM， 2022： 2219-2232.
[13]	GHAZI B， GOLOWICH N， KUMAR R， et al. On the power of multiple anonymous messages： frequency estimation and selection in the shuffle model of differential privacy ［C］// Proceedings of the 2021 Annual International Conference on the Theory and Applications of Cryptographic Techniques， LNCS 12698. Cham： Springer， 2021： 463-488.
[14]	SCOTT M， CORMODE G， MAPLE C. Applying the shuffle model of differential privacy to vector aggregation ［C］// Proceedings of the 2021 British International Conference on Databases. Aachen： CEUR-WS.org， 2022： 50-59.
[15]	BALLE B， BELL J， GASCÓN A， et al. The privacy blanket of the shuffle model ［C］// Proceedings of the 2019 Annual International Cryptology Conference， LNCS 11693. Berlin： Springer， 2019： 638-667.
[16]	LI X， LIU W， CHEN Z， et al. DUMP： a dummy-point-based framework for histogram estimation in shuffle model ［EB/OL］. ［2024-06-22］. .
[17]	BALCER V， CHEU A. Separating local & shuffled differential privacy via histograms ［EB/OL］. ［2024-06-22］. .
[18]	WANG T， XU M， DING B， et al. MURS： practical and robust privacy amplification with multi-party differential privacy ［C］// Proceedings of the 2020 Annual Network and Distributed Systems Security Symposium — Posters. Reston， VA： Internet Society， 2020： 1-3.
[19]	CHEN W N， SONG D， ÖZGÜR A， et al. Privacy amplification via compression： achieving the optimal privacy-accuracy-communication trade-off in distributed mean estimation ［C］// Proceedings of the 37th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2023： 69202-69227.
[20]	ERLINGSSON Ú， FELDMAN V， MIRONOV I， et al. Encode， shuffle， analyze privacy revisited： formalizations and empirical evaluation ［EB/OL］. ［2024-06-20］. .
[21]	曹来成，陈丽.基于OLH和虚拟数据的SDP直方图发布算法［J］.计算机应用研究，2024， 41（12）： 3829-3833.
	CAO L C， CHEN L. SDP histogram publishing algorithm based on OLH and dummy points ［J］. Application Research of Computers， 2024， 41（12）： 3829-3833.
[22]	WANG T， DING B， XU M， et al. Improving utility and security of the shuffler-based differential privacy ［J］. Proceedings of the VLDB Endowment， 2020， 13（13）： 3545-3558.
[23]	SENGUPTA P， PAUL S， MISHRA S. BUDS： balancing utility and differential privacy by shuffling ［C］// Proceedings of the 11th International Conference on Computing， Communication and Networking Technologies. Piscataway： IEEE， 2020： 1-7.
[24]	LIEW S P， TAKAHASHI T， TAKAGI S， et al. Network shuffling： privacy amplification via random walks ［C］// Proceedings of the 2022 International Conference on Management of Data. New York： ACM， 2022： 773-787.
[25]	张啸剑，徐雅鑫，夏庆荣.基于混洗差分隐私的直方图发布方法［J］.软件学报，2022， 33（6）： 2348-2363.
	ZHANG X J， XU Y X， XIA Q R. Histogram publication under shuffled differential privacy ［J］. Journal of Software， 2022， 33（6）： 2348-2363.
[26]	SASY S， JOHNSON A， GOLDBERG I. Fast fully oblivious compaction and shuffling ［C］// Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security. New York： ACM， 2022： 2565-2579.
[27]	GHAZI B， KAMATH P， KUMAR R， et al. Anonymized histograms in intermediate privacy models ［C］// Proceedings of the 36th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2022： 8456-8468.
[28]	BISWAS S， JUNG K， PALAMIDESSI C. Tight differential privacy guarantees for the shuffle model with k-randomized response ［C］// Proceedings of the 2023 International Symposium on Foundations and Practice of Security， LNCS 14551. Cham： Springer， 2024： 440-458.
[29]	YANG R， YANG H， FAN J， et al. Personalized differential privacy in the shuffle model ［C］// Proceedings of the 2023 International Conference on Artificial Intelligence Security and Privacy， LNCS 14509. Singapore： Springer， 2024： 468-482.
[30]	WANG T， BLOCKI J， LI N， et al. Locally differentially private protocols for frequency estimation ［C］// Proceedings of the 26th USENIX Security Symposium. Berkeley： USENIX Association， 2017： 729-745.
[31]	万丽，李方伟，闫少军.基于改进椭圆曲线数字签名的盲签名［J］.计算机应用研究，2011， 28（3）： 1152-1154.
	WAN L， LI F W， YAN S J. Blind signature scheme based on improved elliptic curve digital signature algorithm ［J］. Application Research of Computers， 2011， 28（3）： 1152-1154.
[32]	FLOOD S， KING M， RODGERS R， et al. Integrated public use microdata series： version 9.0 ［DS/OL］. ［2024-06-22］. .
[33]	BODON B. Frequent itemset mining dataset repository — Kosarak dataset ［DS/OL］. ［2024-06-22］. .
[34]	DWORK C. Differential privacy ［C］// Proceedings of the 2006 International Colloquium on Automata， Languages， and Programming， LNCS 4052. Berlin： Springer， 2006： 1-12.

Secure and efficient frequency estimation method based on shuffled differential privacy

安全高效的混洗差分隐私频率估计方法

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 14

References 34

Related Articles 15

Recommended Articles

Metrics

[1]	Jintao SU, Lina GE, Liguang XIAO, Jing ZOU, Zhe WANG. Detection and defense scheme for backdoor attacks in federated learning [J]. Journal of Computer Applications, 2025, 45(8): 2399-2408.
[2]	Lina GE, Mingyu WANG, Lei TIAN. Review of research on efficiency of federated learning [J]. Journal of Computer Applications, 2025, 45(8): 2387-2398.
[3]	Gaimei GAO, Miaolian DU, Chunxia LIU, Yuli YANG, Weichao DANG, Guoxia DI. Privacy protection method for consortium blockchain based on SM2 linkable ring signature [J]. Journal of Computer Applications, 2025, 45(5): 1564-1572.
[4]	Baoyin WANG, Hongmei XUE, Qilie LIU, Tao GUO. Privacy-preserving random consensus asset cross-chain scheme [J]. Journal of Computer Applications, 2025, 45(2): 497-505.
[5]	Xuebin CHEN, Zhiqiang REN, Hongyang ZHANG. Review on security threats and defense measures in federated learning [J]. Journal of Computer Applications, 2024, 44(6): 1663-1672.
[6]	Peiqian LIU, Shuilian WANG, Zihao SHEN, Hui WANG. Location privacy protection algorithm based on trajectory perturbation and road network matching [J]. Journal of Computer Applications, 2024, 44(5): 1546-1554.
[7]	Gaimei GAO, Jin ZHANG, Chunxia LIU, Weichao DANG, Shangwang BAI. Privacy protection scheme for crowdsourced testing tasks based on blockchain and CP-ABE policy hiding [J]. Journal of Computer Applications, 2024, 44(3): 811-818.
[8]	Haifeng MA, Yuxia LI, Qingshui XUE, Jiahai YANG, Yongfu GAO. Attribute-based encryption scheme for blockchain privacy protection [J]. Journal of Computer Applications, 2024, 44(2): 485-489.
[9]	Zhenhao ZHAO, Shibin ZHANG, Wunan WAN, Jinquan ZHANG, zhi QIN. Delegated proof of stake consensus algorithm based on reputation value and strong blind signature algorithm [J]. Journal of Computer Applications, 2024, 44(12): 3717-3722.
[10]	Yiting WANG, Wunan WAN, Shibin ZHANG, Jinquan ZHANG, Zhi QIN. Linkable ring signature scheme based on SM9 algorithm [J]. Journal of Computer Applications, 2024, 44(12): 3709-3716.
[11]	Jing LIANG, Wunan WAN, Shibin ZHANG, Jinquan ZHANG, Zhi QIN. Traceability storage model of charity system oriented to master-slave chain [J]. Journal of Computer Applications, 2024, 44(12): 3751-3758.
[12]	Rui GAO, Xuebin CHEN, Zucuan ZHANG. Dynamic social network privacy publishing method for partial graph updating [J]. Journal of Computer Applications, 2024, 44(12): 3831-3838.
[13]	Miao JIA, Zhongyuan YAO, Weihua ZHU, Tingting GAO, Xueming SI, Xiang DENG. Progress and prospect of zero-knowledge proof enabling blockchain [J]. Journal of Computer Applications, 2024, 44(12): 3669-3677.
[14]	Peng FANG, Fan ZHAO, Baoquan WANG, Yi WANG, Tonghai JIANG. Development， technologies and applications of blockchain 3.0 [J]. Journal of Computer Applications, 2024, 44(12): 3647-3657.
[15]	Yifan WANG, Shaofu LIN, Yunjiang LI. Highway free-flow tolling method based on blockchain and zero-knowledge proof [J]. Journal of Computer Applications, 2024, 44(12): 3741-3750.