位置大数据的联邦学习统计预测与差分隐私保护方法

doi:10.11772/j.issn.1001-9081.2024010068

《计算机应用》唯一官方网站 ›› 2025, Vol. 45 ›› Issue (1): 127-135.DOI: 10.11772/j.issn.1001-9081.2024010068

位置大数据的联邦学习统计预测与差分隐私保护方法

晏燕, 钱星颖(), 闫鹏斌, 杨杰

兰州理工大学计算机与通信学院，兰州 730050

收稿日期:2024-01-19 修回日期:2024-04-05 接受日期:2024-04-07 发布日期:2024-05-09 出版日期:2025-01-10
通讯作者: 钱星颖
作者简介:晏燕（1980—），女，甘肃兰州人，教授，博士，CCF高级会员，主要研究方向：隐私保护、信息安全；
闫鹏斌（1998—），男，河南洛阳人，硕士研究生，主要研究方向：位置隐私保护；
杨杰（1999—），男，河南商丘人，硕士研究生，主要研究方向：机器学习、隐私保护。
基金资助:
国家自然科学基金资助项目(62361036);甘肃省自然科学基金资助项目(22JR5RA279)

Federated learning-based statistical prediction and differential privacy protection method for location big data

Yan YAN, Xingying QIAN(), Pengbin YAN, Jie YANG

School of Computer and Communication，Lanzhou University of Technology，Lanzhou Gansu 730050，China

Received:2024-01-19 Revised:2024-04-05 Accepted:2024-04-07 Online:2024-05-09 Published:2025-01-10
Contact: Xingying QIAN
About author:YAN Yan， born in 1980， Ph. D.， professor. Her research interests include privacy protection， information security.
YAN Pengbin， born in 1998， M. S. candidate. His research interests include location privacy protection.
YANG Jie， born in 1999， M. S. candidate. His research interests include machine learning， privacy protection.
Supported by:
National Natural Science Foundation of China(62361036);Natural Science Foundation of Gansu Province(22JR5RA279)

摘要/Abstract

摘要：

针对分布式位置大数据收集导致的信息孤岛问题和位置隐私泄露面临的风险，提出一种基于联邦学习的位置大数据统计预测与隐私保护方法。首先，构建基于横向联邦学习的位置大数据统计预测发布框架，该框架允许各行政区域的数据收集者保留各自的原始数据，并使多个参与方通过交换训练参数来协同完成预测模型的训练任务；其次，针对具有时空序列特性的位置大数据密度统计预测问题，设计PVTv2-CBAM，以提高客户端预测结果的准确性；最后，提出一种差分隐私预算的动态分配和调整算法，并结合MMA （Modified Moments Accountant）机制实现对客户端模型的差分隐私保护。实验结果表明，相较于卷积神经网络（CNN）、长短期记忆（LSTM）网络、卷积LSTM（ConvLSTM）模型，PVTv2-CBAM在Yellow_tripdata数据集和T-Driver轨迹数据集上预测的平均绝对误差分别降低0~62%和39%~44%；所提差分隐私预算动态分配和调整算法在调整阈值为0.3和0.7时，使模型预测的准确率与无动态调整相比分别提高了约5%与6%。以上结果验证了所提方法的可行性和有效性。

关键词: 位置大数据, 位置隐私, 联邦学习, 差分隐私, 深度学习

Abstract:

To address the information silo problem and the risk of location privacy leakage caused by distributed location big data collection， a statistical prediction and privacy protection method for location big data was proposed on the basis of federated learning. Firstly， a horizontal federated learning-based statistical prediction release framework was constructed for location big data. The framework allowed data collectors in each administrative region to keep their raw data， and multiple participants to collaborate to complete the prediction model’s training task by exchanging training parameters. Secondly， PVTv2-CBAM was developed to improve the accuracy of prediction results at clients， aiming for the problem of statistical prediction location big data density with spatiotemporal sequence characteristics. Finally， combined with the MMA （Modified Moments Accountant） mechanism， a dynamic allocation and adjustment algorithm for differential privacy budget was proposed to achieve diffirential privacy protection of the client models. Experimental results show that compared to models such as Convolutional Neural Network （CNN）， Long Short-Term Memory （LSTM）， and Convolutional LSTM （ConvLSTM）the proposed PVTv2-CBAM improves the prediction accuracy by 0 to 62% on the Yellow_tripdata dataset and by 39% to 44% on the T-Driver trajectory dataset；the proposed differential privacy budget dynamic allocation and adjustment algorithm enhances the model prediction accuracy by about 5% and 6% at adjustment thresholds of 0.3 and 0.7， respectively， compared with no dynamic adjustment. The above validates the feasibility and effectiveness of the proposed method.

Key words: location big data, location privacy, federated learning, differential privacy, deep learning

中图分类号:

TP309

晏燕, 钱星颖, 闫鹏斌, 杨杰. 位置大数据的联邦学习统计预测与差分隐私保护方法[J]. 计算机应用, 2025, 45(1): 127-135.

Yan YAN, Xingying QIAN, Pengbin YAN, Jie YANG. Federated learning-based statistical prediction and differential privacy protection method for location big data[J]. Journal of Computer Applications, 2025, 45(1): 127-135.

图/表 16

图1 联邦学习的训练过程

Fig. 1 Training process of federal learning.

图2 基于横向联邦学习的差分隐私保护框架

Fig. 2 Differential privacy protection framework based on horizontal federated learning

图3 PVTv2-CBAM结构

Fig. 3 Structure of PVTv2-CBAM

图4 曼哈顿区Yellow_tripdata数据集的区域划分

Fig. 4 Regional division of Manhattan Yellow_tripdata dataset

图5 北京市T-Driver数据集的区域划分

Fig. 5 Regional division of Beijing T-Driver dataset

表1 不同模型的计算量和参数量

Tab. 1 Computational volumes efficiency and parameter sizes of different models

模型	浮点运算量/FLOPs	参数量/10⁶
CNN	15.50	0.174
LSTM	13.46	0.166
ConvLSTM	16.16	0.196
PVTv2	19.72	0.159
PVTv2-CBAM	19.90	0.163

表2 不同模型在Yellow_tripdata数据集上的准确性评价指标

Tab. 2 Evaluation metrics of accuracy of different models on Yellow_tripdata dataset

模型	MAE	RMSE	H（P，Q）
CNN	0.060±0.059	0.133±0.068	0.084±0.002
LSTM	0.119±0.050	0.123±0.060	0.123±0.005
ConvLSTM	0.159±0.088	0.185±0.045	0.119±0.010
PVTv2	0.097±0.083	0.127±0.073	0.051±0.013
PVTv2-CBAM	0.060±0.007	0.093±0.019	0.053±0.005

表3 不同模型在T-Driver数据集上的准确性评价指标

Tab. 3 Evaluation metrics of accuracy of different models on T-Driver dataset

模型	MAE	RMSE	H（P，Q）
CNN	0.071±0.010	0.086±0.035	0.065±0.010
LSTM	0.075±0.040	0.060±0.039	0.023±0.013
ConvLSTM	0.069±0.011	0.043±0.002	0.084±0.010
PVTv2	0.041±0.021	0.045±0.011	0.038±0.014
PVTv2-CBAM	0.042±0.029	0.034±0.017	0.033±0.011

表4 消融实验结果

Tab. 4 Results of ablation experiments

PVTv2 （基准）	CBAM	差分隐私	Yellow_tripdata			T-Driver
PVTv2 （基准）	CBAM	差分隐私	MAE	RMSE	H（P，Q）	MAE	RMSE	H（P，Q）
✓			0.097±0.083	0.127±0.073	0.051±0.013	0.041±0.021	0.045±0.011	0.038±0.014
✓	✓		0.060±0.007	0.093±0.019	0.053±0.005	0.042±0.029	0.034±0.017	0.033±0.011
✓		✓	0.094±0.007	0.137±0.051	0.089±0.021	0.039±0.014	0.051±0.016	0.038±0.018
✓	✓	✓	0.054±0.015	0.089±0.030	0.058±0.012	0.042±0.003	0.037±0.021	0.034±0.011

表5 不同隐私预算下的预测指标

Tab. 5 Prediction metrics with different privacy budgets

$ε$	Yellow_tripdata			T-Driver
$ε$	MAE	RMSE	H（P，Q）	MAE	RMSE	H（P，Q）
0.5	0.031	0.045	0.022	0.027	0.041	0.038
1.0	0.024	0.046	0.021	0.026	0.037	0.039
2.0	0.017	0.040	0.018	0.024	0.039	0.036
4.0	0.016	0.038	0.019	0.025	0.042	0.034

表5 不同隐私预算下的预测指标

Tab. 5 Prediction metrics with different privacy budgets

$ε$	Yellow_tripdata			T-Driver
$ε$	MAE	RMSE	H（P，Q）	MAE	RMSE	H（P，Q）
0.5	0.031	0.045	0.022	0.027	0.041	0.038
1.0	0.024	0.046	0.021	0.026	0.037	0.039
2.0	0.017	0.040	0.018	0.024	0.039	0.036
4.0	0.016	0.038	0.019	0.025	0.042	0.034

图6 Yellow_tripdata数据集上模型不同隐私预算下的实验结果

Fig. 6 Experimental results of model on Yellow_tripdata dataset with different privacy budgets

图7 T-Driver数据集上模型不同隐私预算下的实验结果

Fig. 7 Experimental results of model on T-Driver dataset with different privacy budgets

图8 Yellow_tripdata数据集上模型在不同标准差下的实验结果

Fig. 8 Experimental results of model on Yellow_tripdata dataset with different standard deviations

图9 T-Driver数据集上模型在不同标准差下的实验结果

Fig. 9 Experimental results of model on T-Driver dataset with different standard deviations

图10 阈值α对不同数据集上预测精度的影响

Fig. 10 Influence of threshold α on prediction accuracies on different datasets

图11 隐私预算分配方法对预测值的影响

Fig. 11 Influence of privacy budget allocation methods on predicted values

参考文献 32

1	李德仁，邵振峰，于文博，等.基于时空位置大数据的公共疫情防控服务让城市更智慧［J］.武汉大学学报（信息科学版）， 2020， 45（4）： 475-487.
	LI D R， SHAO Z F， YU W B， et al. Public epidemic prevention and control services based on big data of spatiotemporal location make cities more smart ［J］. Geomatics and Information Science of Wuhan University， 2020， 45（4）： 475-487.
2	PAN X， CAI X R， SONG K， et al. Location recommendation based on mobility graph with individual and group influences ［J］. IEEE Transactions on Intelligent Transportation Systems， 2023， 24（8）： 8409-8420.
3	AGARWAL R， HUSSAIN M. Generic framework for privacy preservation in cyber-physical systems ［C］// Proceedings of the 2019 International Conference on Advanced Computing and Intelligent Engineering， AISC 1198. Singapore： Springer， 2021： 257-266.
4	CHANG V， MOU Y， XU Q A. The ethical issues of location-based services on big data and IoT ［C］// Proceedings of the 2020 International Conference on Industrial IoT， Big Data and Supply Chain， SIST 218. Singapore： Springer， 2021： 195-205.
5	SHI X， CHEN Z， WANG H， et al. Convolutional LSTM network： a machine learning approach for precipitation nowcasting ［C］// Proceedings of the 28th International Conference on Neural Information Processing Systems. Cambridge： MIT Press， 2015： 802-810.
6	XIONG L， DING W， HUANG X， et al. CLSTAN： ConvLSTM-based spatiotemporal attention network for traffic flow forecasting ［J］. Mathematical Problems in Engineering， 2022， 2022： No.1604727.
7	HE R， LIU Y， XIAO Y， et al. Deep spatio-temporal 3D densenet with multiscale ConvLSTM-Resnet network for citywide traffic flow forecasting ［J］. Knowledge-Based Systems， 2022， 250： No.109054.
8	夏进，王正群，朱世明.基于时间序列分解的交通流量预测模型［J］.计算机应用， 2023， 43（4）： 1129-1135.
	XIA J， WANG Z Q， ZHU S M. Traffic flow prediction model based on time series decomposition ［J］. Journal of Computer Applications， 2023， 43（4）： 1129-1135.
9	HOCHREITER S， SCHMIDHUBER J. Long short-term memory ［J］. Neural Computation， 1997， 9（8）： 1735-1780.
10	晏燕，丛一鸣， MAHMOOD A，等.基于深度学习的位置大数据统计发布与隐私保护方法［J］.通信学报， 2022， 43（1）： 203-216.
	YAN Y， CONG Y M， MAHMOOD A， et al. Statistics release and privacy protection method of location big data based on deep learning ［J］. Journal on Communications， 2022， 43（1）： 203-216.
11	梁天恺，曾碧，陈光.联邦学习综述：概念、技术、应用与挑战［J］.计算机应用， 2022， 42（12）： 3651-3662.
	LIANG T K， ZENG B， CHEN G. Federated learning survey： concepts， technologies， applications and challenges ［J］. Journal of Computer Applications， 2022， 42（12）： 3651-3662.
12	McMAHAN H B， MOORE E， RAMAGE D， et al. Communication-efficient learning of deep networks from decentralized data ［C］// Proceedings of the 20th International Conference on Artificial Intelligence and Statistics. New York： JMLR.org， 2017： 1273-1282.
13	肖雄，唐卓，肖斌，等.联邦学习的隐私保护与安全防御研究综述［J］.计算机学报， 2023， 46（5）： 1019-1044.
	XIAO X， TANG Z， XIAO B， et al. A survey on privacy and security issues in federated learning ［J］. Chinese Journal of Computers， 2023， 46（5）： 1019-1044.
14	ZHU L， LIU Z， HAN S. Deep leakage from gradients ［C］// Proceedings of the 33rd International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2019： 14774-14784.
15	WANG Z， PENG C， HE X， et al. Wasserstein distance-based deep leakage from gradients ［J］. Entropy， 2023， 25（5）： No.810.
16	SONG C， RISTENPART T， SHMATIKOV V. Machine learning models that remember too much ［C］// Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. New York： ACM， 2017： 587-601.
17	YIN H， MALLYA A， VAHDAT A， et al. See through gradients： image batch recovery via gradinversion ［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 16332-16341.
18	LAM M， WEI G Y， BROOKS D， et al. Gradient disaggregation： breaking privacy in federated learning by reconstructing the user participant matrix ［C］// Proceedings of the 38th International Conference on Machine Learning. New York： JMLR.org， 2021： 5959-5968.
19	DWORK C. Differential privacy ［C］// Proceedings of the 2006 International Colloquium on Automata， Languages， and Programming， LNCS 4052. Berlin： Springer， 2006： 1-12.
20	DWORK C. Differential privacy： a survey of results ［C］// Proceedings of the 2008 International conference on Theory and Applications of Models of Computation， LNCS 4978. Berlin： Springer， 2008： 1-19.
21	ABADI M， CHU A， GOODFELLOW I， et al. Deep learning with differential privacy ［C］// Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. New York： ACM， 2016： 308-318.
22	DING X， CHEN L， ZHOU P， et al. Differentially private deep learning with iterative gradient descent optimization ［J］. ACM/IMS Transactions on Data Science， 2021， 2（4）： No.34.
23	ADNAN M， KALRA S， CRESSWELL J C， et al. Federated learning and differential privacy for medical image analysis ［J］. Scientific Reports， 2022， 12： No.1953.
24	WU X， ZHANG Y， SHI M， et al. An adaptive federated learning scheme with differential privacy preserving ［J］. Future Generation Computer Systems， 2022， 127： 362-372.
25	HUANG X， DING Y， JIANG Z L， et al. DP-FL： a novel differentially private federated learning framework for the unbalanced data ［J］. World Wide Web， 2020， 23： 2529-2545.
26	ZHAO J， MAO K， HUANG C， et al. Utility optimization of federated learning with differential privacy ［J］. Discrete Dynamics in Nature and Society， 2021， 2021： No.3344862.
27	WANG W， XIE E， LI X， et al. PVT v2： improved baselines with Pyramid Vision Transformer ［J］. Computational Visual Media， 2022， 8（3）： 415-424.
28	WOO S， PARK J， LEE J Y， et al. CBAM： convolutional block attention module ［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11211. Cham： Springer， 2018： 3-19.
29	LI T， SAHU A K， ZAHEER M， et al. Federated optimization in heterogeneous networks ［EB/OL］. ［2023-10-02］. .
30	DWORK C， McSHERRY F， NISSIM K， et al. Calibrating noise to sensitivity in private data analysis ［J］. Journal of Privacy and Confidentiality， 2016， 7（3）： 17-51.
31	DWORK C， ROTH A. The algorithmic foundations of differential privacy ［J］. Foundations and Trends in Theoretical Computer Science， 2014， 9（3/4）： 211-407.
32	段聪颖，陈思光.基于联邦深度学习的皮肤病智能诊断研究［J］.生物信息学， 2024， 22（2）： 101-108.
	DUAN C Y， CHEN S G. Federated deep learning-based intelligent diagnosis for skin lesion ［J］. Chinese Journal of Bioinformatics， 2024， 22（2）： 101-108.

[1]	郑宗生, 杜嘉, 成雨荷, 赵泽骋, 张月维, 王绪龙. 用于红外-可见光图像分类的跨模态双流交替交互网络[J]. 《计算机应用》唯一官方网站, 2025, 45(1): 275-283.
[2]	徐欣然, 张绍兵, 成苗, 张洋, 曾尚. 基于多路层次化混合专家模型的轴承故障诊断方法[J]. 《计算机应用》唯一官方网站, 2025, 45(1): 59-68.
[3]	朱亮, 慕京哲, 左洪强, 谷晶中, 朱付保. 基于联邦图神经网络的位置隐私保护推荐方案[J]. 《计算机应用》唯一官方网站, 2025, 45(1): 136-143.
[4]	梁杰涛, 罗兵, 付兰慧, 常青玲, 李楠楠, 易宁波, 冯其, 何鑫, 邓辅秦. 基于坐标几何采样的点云配准方法[J]. 《计算机应用》唯一官方网站, 2025, 45(1): 214-222.
[5]	张思齐, 张金俊, 王天一, 秦小林. 基于信号时态逻辑的深度时序事件检测算法[J]. 《计算机应用》唯一官方网站, 2025, 45(1): 90-97.
[6]	张淑芬, 张宏扬, 任志强, 陈学斌. 联邦学习的公平性综述[J]. 《计算机应用》唯一官方网站, 2025, 45(1): 1-14.
[7]	潘烨新, 杨哲. 基于多级特征双向融合的小目标检测优化模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2871-2877.
[8]	张治政, 张啸剑, 王俊清, 冯光辉. 结合差分隐私与安全聚集的联邦空间数据发布方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2777-2784.
[9]	陈廷伟, 张嘉诚, 王俊陆. 面向联邦学习的随机验证区块链构建[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2770-2776.
[10]	黄云川, 江永全, 黄骏涛, 杨燕. 基于元图同构网络的分子毒性预测[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2964-2969.
[11]	秦璟, 秦志光, 李发礼, 彭悦恒. 基于概率稀疏自注意力神经网络的重性抑郁疾患诊断[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2970-2974.
[12]	王熙源, 张战成, 徐少康, 张宝成, 罗晓清, 胡伏原. 面向手术导航3D/2D配准的无监督跨域迁移网络[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2911-2918.
[13]	李顺勇, 李师毅, 胥瑞, 赵兴旺. 基于自注意力融合的不完整多视图聚类算法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2696-2703.
[14]	刘禹含, 吉根林, 张红苹. 基于骨架图与混合注意力的视频行人异常检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2551-2557.
[15]	顾焰杰, 张英俊, 刘晓倩, 周围, 孙威. 基于时空多图融合的交通流量预测[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2618-2625.

位置大数据的联邦学习统计预测与差分隐私保护方法

Federated learning-based statistical prediction and differential privacy protection method for location big data

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 16

参考文献 32

相关文章 15

编辑推荐

Metrics