Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (11): 3487-3494.DOI: 10.11772/j.issn.1001-9081.2023101500

• Cyber security • Previous Articles     Next Articles

Malicious traffic detection model based on semi-supervised federated learning

Shuaihua ZHANG1,2, Shufen ZHANG1,2,3(), Mingchuan ZHOU1,2, Chao XU1,2, Xuebin CHEN1,2,4   

  1. 1.College of Sciences,North China University of Science and Technology,Tangshan Hebei 063210,China
    2.Hebei Provincial Key Laboratory of Data Science and Application (North China University of Science and Technology),Tangshan Hebei 063210,China
    3.Tangshan Key Laboratory of Big Data Security and Intelligent Computing (Beijing Jiaotong University),Tangshan Hebei 063210,China
    4.Tangshan Key Laboratory of Data Science (North China University of Science and Technology),Tangshan Hebei 063210,China
  • Received:2023-11-06 Revised:2024-01-04 Accepted:2024-01-12 Online:2024-11-13 Published:2024-11-10
  • Contact: Shufen ZHANG
  • About author:ZHANG Shuaihua, born in 1999, M. S. candidate. His research interests include data security, network security, privacy protection.
    ZHOU Mingchuan, born in 1997, M. S. candidate. His research interests include network security, privacy protection.
    XU Chao, born in 1998, M. S. candidate. His research interests include data security, privacy protection.
    CHEN Xuebin, born in 1970, Ph. D., professor. His research interests include data security, IoT security, network security.
  • Supported by:
    National Natural Science Foundation of China(U20A20179)

基于半监督联邦学习的恶意流量检测模型

张帅华1,2, 张淑芬1,2,3(), 周明川1,2, 徐超1,2, 陈学斌1,2,4   

  1. 1.华北理工大学 理学院,河北 唐山 063210
    2.河北省数据科学与应用重点实验室(华北理工大学),河北 唐山 063210
    3.唐山市大数据安全与智能计算重点实验室(北京交通大学),河北 唐山 063210
    4.唐山市数据科学重点实验室(华北理工大学),河北 唐山 063210
  • 通讯作者: 张淑芬
  • 作者简介:张帅华(1999—),男,河北石家庄人,硕士研究生,CCF会员,主要研究方向:数据安全、网络安全、隐私保护
    周明川(1997—),男,吉林长春人,硕士研究生,CCF会员,主要研究方向:网络安全、隐私保护
    徐超(1998—),男,河南驻马店人,硕士研究生,CCF会员,主要研究方向:数据安全、隐私保护
    陈学斌(1970—),男,河北唐山人,教授,博士,CCF杰出会员,主要研究方向:数据安全、物联网安全、网络安全。
  • 基金资助:
    国家自然科学基金资助项目(U20A20179)

Abstract:

Malicious traffic detection is one of the key technologies to deal with network security challenges. Aiming at the problems of insufficient local labeled data and degradation of co-trained model performance due to non-Independent and Identical Distribution (non-IID) when using federated learning for malicious traffic detection, a semi-supervised federated learning-based malicious traffic detection model was constructed. The proposed model was trained effectively by information extracted from unlabeled data with the help of semi-supervised learning techniques of pseudo-labeling and consistency regularization terms. At the same time, a nonlinear function was designed to dynamically adjust the weights of the client's local supervised and unsupervised losses during aggregation to make full use of unlabeled data and improve accuracy of the model. To reduce the impact of non-IID problems on performance of the global model, a federated aggregation algorithm FedLD (Federated-Loss-Data) was proposed, which adaptively adjusted the weights of different client models in the global model aggregation process through a weight calculation method that combined training loss and data volume. Experimental results show that on NSL-KDD dataset, the proposed model can achieve higher detection accuracy when labeled data is limited. Compared with the baseline model FedSem (Federated Semi-supervised), the proposed model has the detection accuracy increased by 4.11 percentage points, and the recall in Normal, Denial-of-Service (DoS), Probe and other categories also increased by 1.65 to 7.66 percentage points, verifying that the proposed model is more suitable for applications in the field of malicious traffic detection.

Key words: federated learning, semi-supervised learning, malicious traffic detection, consistency regularization, dynamic aggregation weight

摘要:

恶意流量检测是应对网络安全挑战的关键技术之一。针对采用联邦学习进行恶意流量检测时,本地标记数据不足,非独立同分布(non-IID)导致协同训练模型性能下降的问题,构建一种基于半监督联邦学习的恶意流量检测模型。该模型借助伪标记和一致性正则化项的半监督学习技术,有效地从未标记数据中提取信息进行训练;同时,设计一种非线性函数,用于动态调整客户端本地有监督和无监督损失在聚合时的权重,以充分利用未标记数据,提高模型的准确性。为降低non-IID问题对全局模型性能的影响,提出一种联邦聚合算法FedLD (Federated-Loss-Data),通过结合训练损失和数据量的权重计算方法,自适应地调整全局模型聚合过程中各客户端模型的权重。实验结果表明,在NSL-KDD数据集上,所提模型在标记数据有限的情况下能够实现较高的检测准确率,与基线模型FedSem (Federated Semi-supervised)相比,检测准确率提升了4.11个百分点,在正常流量(Normal)、拒绝服务(DoS)攻击和探测(Probe)等类别上的召回率也提升了1.65~7.66个百分点,说明所提模型更适用于恶意流量检测领域。

关键词: 联邦学习, 半监督学习, 恶意流量检测, 一致性正则化, 动态聚合权重

CLC Number: