《计算机应用》唯一官方网站 ›› 2025, Vol. 45 ›› Issue (1): 127-135.DOI: 10.11772/j.issn.1001-9081.2024010068

• 网络空间安全 • 上一篇    下一篇

位置大数据的联邦学习统计预测与差分隐私保护方法

晏燕, 钱星颖(), 闫鹏斌, 杨杰   

  1. 兰州理工大学 计算机与通信学院,兰州 730050
  • 收稿日期:2024-01-19 修回日期:2024-04-05 接受日期:2024-04-07 发布日期:2024-05-09 出版日期:2025-01-10
  • 通讯作者: 钱星颖
  • 作者简介:晏燕(1980—),女,甘肃兰州人,教授,博士,CCF高级会员,主要研究方向:隐私保护、信息安全;
    闫鹏斌(1998—),男,河南洛阳人,硕士研究生,主要研究方向:位置隐私保护;
    杨杰(1999—),男,河南商丘人,硕士研究生,主要研究方向:机器学习、隐私保护。
  • 基金资助:
    国家自然科学基金资助项目(62361036);甘肃省自然科学基金资助项目(22JR5RA279)

Federated learning-based statistical prediction and differential privacy protection method for location big data

Yan YAN, Xingying QIAN(), Pengbin YAN, Jie YANG   

  1. School of Computer and Communication,Lanzhou University of Technology,Lanzhou Gansu 730050,China
  • Received:2024-01-19 Revised:2024-04-05 Accepted:2024-04-07 Online:2024-05-09 Published:2025-01-10
  • Contact: Xingying QIAN
  • About author:YAN Yan, born in 1980, Ph. D., professor. Her research interests include privacy protection, information security.
    YAN Pengbin, born in 1998, M. S. candidate. His research interests include location privacy protection.
    YANG Jie, born in 1999, M. S. candidate. His research interests include machine learning, privacy protection.
  • Supported by:
    National Natural Science Foundation of China(62361036);Natural Science Foundation of Gansu Province(22JR5RA279)

摘要:

针对分布式位置大数据收集导致的信息孤岛问题和位置隐私泄露面临的风险,提出一种基于联邦学习的位置大数据统计预测与隐私保护方法。首先,构建基于横向联邦学习的位置大数据统计预测发布框架,该框架允许各行政区域的数据收集者保留各自的原始数据,并使多个参与方通过交换训练参数来协同完成预测模型的训练任务;其次,针对具有时空序列特性的位置大数据密度统计预测问题,设计PVTv2-CBAM,以提高客户端预测结果的准确性;最后,提出一种差分隐私预算的动态分配和调整算法,并结合MMA (Modified Moments Accountant)机制实现对客户端模型的差分隐私保护。实验结果表明,相较于卷积神经网络(CNN)、长短期记忆(LSTM)网络、卷积LSTM(ConvLSTM)模型,PVTv2-CBAM在Yellow_tripdata数据集和T-Driver轨迹数据集上预测的平均绝对误差分别降低0~62%和39%~44%;所提差分隐私预算动态分配和调整算法在调整阈值为0.3和0.7时,使模型预测的准确率与无动态调整相比分别提高了约5%与6%。以上结果验证了所提方法的可行性和有效性。

关键词: 位置大数据, 位置隐私, 联邦学习, 差分隐私, 深度学习

Abstract:

To address the information silo problem and the risk of location privacy leakage caused by distributed location big data collection, a statistical prediction and privacy protection method for location big data was proposed on the basis of federated learning. Firstly, a horizontal federated learning-based statistical prediction release framework was constructed for location big data. The framework allowed data collectors in each administrative region to keep their raw data, and multiple participants to collaborate to complete the prediction model’s training task by exchanging training parameters. Secondly, PVTv2-CBAM was developed to improve the accuracy of prediction results at clients, aiming for the problem of statistical prediction location big data density with spatiotemporal sequence characteristics. Finally, combined with the MMA (Modified Moments Accountant) mechanism, a dynamic allocation and adjustment algorithm for differential privacy budget was proposed to achieve diffirential privacy protection of the client models. Experimental results show that compared to models such as Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), and Convolutional LSTM (ConvLSTM)the proposed PVTv2-CBAM improves the prediction accuracy by 0 to 62% on the Yellow_tripdata dataset and by 39% to 44% on the T-Driver trajectory dataset;the proposed differential privacy budget dynamic allocation and adjustment algorithm enhances the model prediction accuracy by about 5% and 6% at adjustment thresholds of 0.3 and 0.7, respectively, compared with no dynamic adjustment. The above validates the feasibility and effectiveness of the proposed method.

Key words: location big data, location privacy, federated learning, differential privacy, deep learning

中图分类号: