《计算机应用》唯一官方网站

• •    下一篇

融合秘密分享技术的双重联邦学习框架

罗玮,刘金全,张铮   

  1. 国能大渡河大数据服务有限公司
  • 收稿日期:2023-07-03 修回日期:2023-09-13 发布日期:2023-11-03 出版日期:2023-11-03
  • 通讯作者: 罗玮
  • 基金资助:
    四川省重点研发计划项目

Dual vertical federated learning framework incorporating secret sharing technology

  • Received:2023-07-03 Revised:2023-09-13 Online:2023-11-03 Published:2023-11-03
  • Supported by:
    Key R&D Plan Projects in Sichuan Province

摘要: 针对水电行业中的跨媒体数据融合建模和隐私保护的问题,提出一种融合秘密分享技术的双重纵向联邦学习框架。首先,将各参与方节点进行分层,其中底层节点负责预建模,中间层节点负责预模型汇总与优化,中心方节点则生成最终模型。其次,为强化数据隐私性保护和防范推理攻击,引入了基于秘密分享技术的中间参数保护机制。在该机制中,数据拥有者与模型训练方之间的通信数据被碎片化分割,确保了模型参数与训练者的对应关系的隐蔽性,提高了攻击者进行推理攻击的难度。最后,为优化联邦学习的模型聚合过程,引入了基于信息量差异的节点评估机制。该机制综合考虑节点的相异度和数据量,精细权衡不同节点的在模型聚合中的权重,并剔除疑似的恶意节点的贡献,从而优化了模型的性能和收敛速度。在实验部分,数据集选取为国电大渡河流域水电开发有限公司的真实数据,结果显示:基于秘密分享技术的中间参数保护机制相比于差分隐私保护机制,收敛过程更为稳定,收敛速度提升14.2%;引入基于信息量差异的节点评估机制,相比于联邦平均算法,模型误差降低了3.7%,收敛速度提升13.4%。可见,所提出的方案解决了水电数据的跨媒体数据融合建模问题,并具有数据隐私保护和模型收敛加速的优势。

关键词: 水电数据, 数据融合, 联邦学习, 推理攻击, 数据隐私

Abstract: To address the issues of cross-media data fusion modeling and privacy protection in the hydropower industry, a dual vertical federated learning framework incorporating secret sharing technology was proposed. First, the participant nodes were stratified, with lower-tier nodes responsible for preliminary modeling, intermediate-tier nodes overseeing pre-model aggregation and optimization, and central nodes generating the final model. Then, in order to strengthen data privacy protection and prevent inference attacks, an inter-mediate parameter protection mechanism based on secret sharing technology was introduced, the communication data between the data owner and the model trainer was fragmented and divided, which ensured the covertness of the correspondence between the model parameters and the trainer, thereby increasing the complexity of inference attacks. Finally, in order to optimize the model aggregation process of federated learning, a node evaluation mechanism based on the disparity in information quantities was introduced, in which the node dissimilarity and data volume were assessed. The weights of different nodes in model aggregation were finely adjusted, and the contribution of suspected malicious nodes was eliminated, thus optimizing the performance and convergence speed of the model. In the experimental part, the dataset was selected as the real data of Guodian Dadu River Basin Hydropower Development Co. The results show that: the intermediate parameter protection mechanism based on the secret sharing technology is more stable in the convergence process and the convergence speed is improved by 14.2% compared with the differential privacy protection mechanism.Upon the integration of node evaluation mechanism based on information disparity, the model's error is reduced by 3.7% and convergence speed is boosted by 13.4% compared with Federated Averaging Algorithm. It is verified that the proposed solution has addressed the issue of cross-media data fusion modeling for hydropower data, and it possesses the advantages of data privacy protection and model convergence acceleration.

Key words: hydropower data, data fusion, federated learning, inference attacks, data privacy

中图分类号: