《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (11): 3497-3503.DOI: 10.11772/j.issn.1001-9081.2022111792

• 网络空间安全 • 上一篇    

融合特征选择的随机森林DDoS攻击检测

徐精诚1,2,3, 陈学斌1,2,3(), 董燕灵1,2,3, 杨佳1   

  1. 1.华北理工大学 理学院, 河北 唐山 063210
    2.河北省数据科学与应用重点实验室(华北理工大学), 河北 唐山 063210
    3.华北理工大学 唐山市数据科学重点实验室, 河北 唐山 063210
  • 收稿日期:2022-12-06 修回日期:2023-03-02 接受日期:2023-03-03 发布日期:2023-03-14 出版日期:2023-11-10
  • 通讯作者: 陈学斌
  • 作者简介:徐精诚(1996—),男,江苏常州人,硕士研究生,CCF会员,主要研究方向:数据安全、隐私保护
    陈学斌(1970—),男,河北唐山人,教授,博士,CCF杰出会员,主要研究方向:大数据安全、物联网安全、网络安全 chxb@qq.com
    董燕灵(1998—),女,浙江宁波人,硕士研究生,CCF会员,主要研究方向:数据安全、隐私保护
    杨佳(1996—),男,河北唐山人,硕士研究生,主要研究方向:数据挖掘、网络安全。
  • 基金资助:
    国家自然科学基金资助项目(U20A20179)

DDoS attack detection by random forest fused with feature selection

Jingcheng XU1,2,3, Xuebin CHEN1,2,3(), Yanling DONG1,2,3, Jia YANG1   

  1. 1.College of Sciences,North China University of Science and Technology,Tangshan Hebei 063210,China
    2.Hebei Provincial Key Laboratory of Data Science and Application (North China University of Science and Technology),Tangshan Hebei 063210,China
    3.Tangshan Key Laboratory of Data Science,North China University of Science and Technology,Tangshan Hebei 063210,China
  • Received:2022-12-06 Revised:2023-03-02 Accepted:2023-03-03 Online:2023-03-14 Published:2023-11-10
  • Contact: Xuebin CHEN
  • About author:XU Jingcheng, born in 1996, M. S. candidate. His research interests include data security, privacy protection.
    CHEN Xuebin, born in 1970, Ph. D., professor. His research interests include big data security, internet of things security, network security.
    DONG Yanling, born in 1998, M. S. candidate. Her research interests include data security, privacy protection.
    YANG Jia, born in 1996, M. S. candidate. His research interests include data mining, network security.
  • Supported by:
    National Natural Science Foundation of China(U20A20179)

摘要:

现有基于机器学习的分布式拒绝服务(DDoS)攻击检测方法在面对愈发复杂的网络流量、不断升维的数据结构时,检测难度和成本不断上升。针对这些问题,提出一种融合特征选择的随机森林DDoS攻击检测方法。该方法选用基于基尼系数的平均不纯度算法作为特征选择算法,对DDoS异常流量样本进行降维,以降低训练成本、提高训练精度;同时将特征选择算法嵌入随机森林的单个基学习器,将特征子集搜索范围由全部特征缩小到单个基学习器对应特征,在提高两种算法耦合性的同时提高了模型精度。实验结果表明,融合特征选择的随机森林DDoS攻击检测方法训练所得到的模型,在限制决策树棵数和训练样本数量的前提下,召回率相较于改进前提升21.8个百分点,F1-score值提升12.0个百分点,均优于传统的随机森林检测方案。

关键词: 分布式拒绝服务, 特征选择, 基尼系数, 平均不纯度算法, 随机森林算法

Abstract:

Exsiting machine learning-based methods for Distributed Denial-of-Service (DDoS) attack detection continue to increase in detection difficulty and cost when facing more and more complex network traffic and constantly increased data structures. To address these issues, a random forest DDoS attack detection method that integrates feature selection was proposed. In this method, the mean impurity algorithm based on Gini coefficient was used as the feature selection algorithm to reduce the dimensionality of DDoS abnormal traffic samples, thereby reducing training cost and improving training accuracy. Meanwhile, the feature selection algorithm was embedded into the single base learner of random forest, and the feature subset search range was reduced from all features to the features corresponding to a single base learner, which improved the coupling of the two algorithms and improved the model accuracy. Experimental results show that the model trained by the random forest DDoS attack detection method that integrates feature selection has a recall increased by 21.8 percentage points and an F1-score increased by 12.0 percentage points compared to the model before improvement under the premise of limiting decision tree number and training sample size, and both of them are also better than those of the traditional random forest detection scheme.

Key words: Distributed Denial-of-Service (DDoS), feature selection, Gini coefficient, mean impurity algorithm, random forest algorithm

中图分类号: