融合特征选择的随机森林DDoS攻击检测

doi:10.11772/j.issn.1001-9081.2022111792

《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (11): 3497-3503.DOI: 10.11772/j.issn.1001-9081.2022111792

• 网络空间安全 • 上一篇

融合特征选择的随机森林DDoS攻击检测

徐精诚¹^,²^,³, 陈学斌¹^,²^,³(), 董燕灵¹^,²^,³, 杨佳¹

^1.华北理工大学理学院, 河北唐山 063210
^2.河北省数据科学与应用重点实验室(华北理工大学), 河北唐山 063210
^3.华北理工大学唐山市数据科学重点实验室, 河北唐山 063210

收稿日期:2022-12-06 修回日期:2023-03-02 接受日期:2023-03-03 发布日期:2023-03-14 出版日期:2023-11-10
通讯作者: 陈学斌
作者简介:徐精诚（1996—），男，江苏常州人，硕士研究生，CCF会员，主要研究方向：数据安全、隐私保护
陈学斌（1970—），男，河北唐山人，教授，博士，CCF杰出会员，主要研究方向：大数据安全、物联网安全、网络安全 chxb@qq.com
董燕灵（1998—），女，浙江宁波人，硕士研究生，CCF会员，主要研究方向：数据安全、隐私保护
杨佳（1996—），男，河北唐山人，硕士研究生，主要研究方向：数据挖掘、网络安全。
基金资助:
国家自然科学基金资助项目(U20A20179)

DDoS attack detection by random forest fused with feature selection

Jingcheng XU¹^,²^,³, Xuebin CHEN¹^,²^,³(), Yanling DONG¹^,²^,³, Jia YANG¹

^1.College of Sciences，North China University of Science and Technology，Tangshan Hebei 063210，China
^2.Hebei Provincial Key Laboratory of Data Science and Application （North China University of Science and Technology），Tangshan Hebei 063210，China
^3.Tangshan Key Laboratory of Data Science，North China University of Science and Technology，Tangshan Hebei 063210，China

Received:2022-12-06 Revised:2023-03-02 Accepted:2023-03-03 Online:2023-03-14 Published:2023-11-10
Contact: Xuebin CHEN
About author:XU Jingcheng， born in 1996， M. S. candidate. His research interests include data security， privacy protection.
CHEN Xuebin， born in 1970， Ph. D.， professor. His research interests include big data security， internet of things security， network security.
DONG Yanling， born in 1998， M. S. candidate. Her research interests include data security， privacy protection.
YANG Jia， born in 1996， M. S. candidate. His research interests include data mining， network security.
Supported by:
National Natural Science Foundation of China(U20A20179)

摘要/Abstract

摘要：

现有基于机器学习的分布式拒绝服务（DDoS）攻击检测方法在面对愈发复杂的网络流量、不断升维的数据结构时，检测难度和成本不断上升。针对这些问题，提出一种融合特征选择的随机森林DDoS攻击检测方法。该方法选用基于基尼系数的平均不纯度算法作为特征选择算法，对DDoS异常流量样本进行降维，以降低训练成本、提高训练精度；同时将特征选择算法嵌入随机森林的单个基学习器，将特征子集搜索范围由全部特征缩小到单个基学习器对应特征，在提高两种算法耦合性的同时提高了模型精度。实验结果表明，融合特征选择的随机森林DDoS攻击检测方法训练所得到的模型，在限制决策树棵数和训练样本数量的前提下，召回率相较于改进前提升21.8个百分点，F1-score值提升12.0个百分点，均优于传统的随机森林检测方案。

关键词: 分布式拒绝服务, 特征选择, 基尼系数, 平均不纯度算法, 随机森林算法

Abstract:

Exsiting machine learning-based methods for Distributed Denial-of-Service （DDoS） attack detection continue to increase in detection difficulty and cost when facing more and more complex network traffic and constantly increased data structures. To address these issues， a random forest DDoS attack detection method that integrates feature selection was proposed. In this method， the mean impurity algorithm based on Gini coefficient was used as the feature selection algorithm to reduce the dimensionality of DDoS abnormal traffic samples， thereby reducing training cost and improving training accuracy. Meanwhile， the feature selection algorithm was embedded into the single base learner of random forest， and the feature subset search range was reduced from all features to the features corresponding to a single base learner， which improved the coupling of the two algorithms and improved the model accuracy. Experimental results show that the model trained by the random forest DDoS attack detection method that integrates feature selection has a recall increased by 21.8 percentage points and an F1-score increased by 12.0 percentage points compared to the model before improvement under the premise of limiting decision tree number and training sample size， and both of them are also better than those of the traditional random forest detection scheme.

Key words: Distributed Denial-of-Service (DDoS), feature selection, Gini coefficient, mean impurity algorithm, random forest algorithm

中图分类号:

TP393.08

徐精诚, 陈学斌, 董燕灵, 杨佳. 融合特征选择的随机森林DDoS攻击检测[J]. 计算机应用, 2023, 43(11): 3497-3503.

Jingcheng XU, Xuebin CHEN, Yanling DONG, Jia YANG. DDoS attack detection by random forest fused with feature selection[J]. Journal of Computer Applications, 2023, 43(11): 3497-3503.

图/表 4

图1 DDoS攻击示意图

Fig.1 Schematic diagram of DDoS attack

图2 DDoS攻击检测流程

Fig.2 DDoS attack detection flow

图3 四种学习模型的训练结果

Fig. 3 Training results of four learning models

图4 四种模型的训练时长对比

Fig. 4 Comparison of training time of four models

参考文献 22

1	DOSHI R， APTHORPE N， FEAMSTER N. Machine learning DDoS detection for consumer internet of things devices［C］// Proceedings of the 2018 IEEE Security and Privacy Workshops. Piscataway： IEEE， 2018： 29-35. 10.1109/spw.2018.00013
2	腾讯云T-Sec DDoS防护团队，绿盟科技威胁情报团队. 2021年全球DDoS威胁报告［R/OL］. ［2022-09-14］..
	Tencent Cloud T-Sec DDoS Protection Group， NSFOCUS Threat Intelligence Group. Global DDoS threat report 2021［R/OL］. ［2022-09-14］..
3	PRIYA S S， SIVARAM M， YUVARAJ D， et al. Machine learning based DDoS detection［C］// Proceedings of the 2020 International Conference on Emerging Smart Computing and Informatics. Piscataway： IEEE， 2020： 234-237. 10.1109/esci48226.2020.9167642
4	SUTHAHARAN S. Decision tree learning［M］// Machine Learning Models and Algorithms for Big Data Classification： Thinking with Examples for Effective Learning， ISIS 36. Cham： Springer， 2016：237-269. 10.1007/978-1-4899-7641-3_10
5	JIA B， HUANG X， LIU R， et al. A DDoS attack detection method based on hybrid heterogeneous multiclassifier ensemble learning［J］. Journal of Electrical and Computer Engineering， 2017， 2017： No.4975343. 10.1155/2017/4975343
6	NAJAFIMEHR M， ZARIFZADEH S， MOSTAFAVI S. A hybrid machine learning approach for detecting unprecedented DDoS attacks［J］. The Journal of Supercomputing， 2022， 78（6）： 8106-8136. 10.1007/s11227-021-04253-x
7	孟曈. 基于机器学习与可逆Sketch的DDoS攻击检测［D］. 西安：西安电子科技大学， 2020：92-92.
	MENG T. DDoS intrusion detection based on machine learning and reversible sketch［D］. Xi’an： Xidian University， 2020： 92-92.
8	OSANAIYE O， CAI H， CHOO K K R， et al. Ensemble-based multi-filter feature selection method for DDoS detection in cloud computing［J］. EURASIP Journal on Wireless Communications and Networking， 2016， 2016： No.130. 10.1186/s13638-016-0623-3
9	GU Y， LI K， GUO Z， et al. Semi-supervised k-means DDoS detection method using hybrid feature selection algorithm［J］. IEEE Access， 2019， 7： 64351-64365. 10.1109/access.2019.2917532
10	PANDE S， KHAMPARIA A， GUPTA D， et al. DDOS detection using machine learning technique［M］// KHANNA A， SINGH A K， SWAROOP A. Recent Studies on Computational Intelligence： Doctoral Symposium on Computational Intelligence （DoSCI 2020）， SCI 921. Singapore： Springer， 2021： 59-68. 10.1007/978-981-15-8469-5_5
11	CHENG J， LI M， TANG X， et al. Flow correlation degree optimization driven random forest for detecting DDoS attacks in cloud computing［J］. Security and Communication Networks， 2018， 2018： No.6459326. 10.1155/2018/6459326
12	LOURENÇO P， GODINHO S， SOUSA A， et al. Estimating tree aboveground biomass using multispectral satellite-based data in Mediterranean agroforestry system using random forest algorithm［J］. Remote Sensing Applications： Society and Environment， 2021， 23： No.100560. 10.1016/j.rsase.2021.100560
13	RIGATTI S J. Random forest［J］. Journal of Insurance Medicine， 2017， 47（1）： 31-39. 10.17849/insm-47-01-31-39.1
14	HESTERBERG T. Bootstrap［J］. WIREs： Computational Statistics， 2011， 3（6）： 497-526. 10.1002/wics.182
15	BREIMAN L， FRIEDMAN J H， OLSHEN R A， et al. Classification And Regression Trees （CART）［M］// Biometrics. ［S.l］： Wadsworth， 1984： 358. 10.2307/2530946
16	BREIMAN L. Bagging predictors［J］. Machine Learning， 1996， 24（2）： 123-140. 10.1007/bf00058655
17	李郅琴，杜建强，聂斌，等. 特征选择方法综述［J］. 计算机工程与应用， 2019， 55（24）：10-19. 10.3778/j.issn.1002-8331.1909-0066
	LI Z Q， DU J Q， NIE B， et al. Summary of feature selection methods［J］. Computer Engineering and Applications， 2019， 55（24）： 10-19. 10.3778/j.issn.1002-8331.1909-0066
18	KIRA K， RENDELL L A. The feature selection problem： traditional methods and a new algorithm［C］// Proceedings of the 10th AAAI Conference on Artificial intelligence. Menlo Park， CA： AAAI Press， 1992： 129-134. 10.1016/b978-1-55860-247-2.50037-1
19	MIKA S， RATSCH G， WESTON J， et al. Fisher discriminant analysis with kernels［C］// Neural Networks for Signal Processing IX： Proceedings of the 1999 IEEE Signal Processing Society Workshop. Piscataway： IEEE， 1999： 41-48. 10.1109/nnsp.1999.788116
20	VERLEYSEN M， FRANÇOIS D. The curse of dimensionality in data mining and time series prediction［C］// Proceedings of the 2005 International Work-Conference on Artificial Neural Networks， LNCS 3512. Berlin： Springer， 2005： 758-770.
21	TANGIRALA S. Evaluating the impact of GINI index and information gain on classification using decision tree classifier algorithm［J］. International Journal of Advanced Computer Science and Applications， 2020， 11（2）： 612-619. 10.14569/ijacsa.2020.0110277
22	RAO H， SHI X， RODRIGUE A K， et al. Feature selection based on artificial bee colony and gradient boosting decision tree［J］. Applied Soft Computing， 2019， 74： 634-642. 10.1016/j.asoc.2018.10.036

[1]	何添, 沈宗鑫, 黄倩倩, 黄雁勇. 基于自适应学习的多视图无监督特征选择方法[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2657-2664.
[2]	葛晨洋, 刘勤让, 裴雪, 魏帅, 朱正彬. 软件定义网络中高效协同防御分布式拒绝服务攻击的方案[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2477-2485.
[3]	孙林, 黄金旭, 徐久成. 基于邻域容差互信息和鲸鱼优化算法的非平衡数据特征选择[J]. 《计算机应用》唯一官方网站, 2023, 43(6): 1842-1854.
[4]	于振华, 刘争气, 刘颖, 郭城. 基于自适应混合粒子群优化的软件缺陷预测特征选择方法[J]. 《计算机应用》唯一官方网站, 2023, 43(4): 1206-1213.
[5]	马磊, 罗川, 李天瑞, 陈红梅. 基于模糊粗糙集的无监督动态特征选择算法[J]. 《计算机应用》唯一官方网站, 2023, 43(10): 3121-3128.
[6]	陈亮, 汤显峰. 改进正余弦算法优化特征选择及数据分类[J]. 《计算机应用》唯一官方网站, 2022, 42(6): 1852-1861.
[7]	赵静, 韩京宇, 钱龙, 毛毅. 基于改进的RAKEL算法的心电图诊断分类[J]. 《计算机应用》唯一官方网站, 2022, 42(6): 1892-1897.
[8]	李晓寒, 贾华丁, 程雪, 李太勇. 基于改进遗传算法和图神经网络的股市波动预测方法[J]. 《计算机应用》唯一官方网站, 2022, 42(5): 1624-1633.
[9]	孙林, 赵婧, 徐久成, 王欣雅. 基于邻域粗糙集和帝王蝶优化的特征选择算法[J]. 《计算机应用》唯一官方网站, 2022, 42(5): 1355-1366.
[10]	李莉, 石可欣, 任振康. 基于特征选择和TrAdaBoost的跨项目缺陷预测方法[J]. 《计算机应用》唯一官方网站, 2022, 42(5): 1554-1562.
[11]	张小清, 王晨曦, 吕彦, 林耀进. 基于ReliefF的层次分类在线流特征选择算法[J]. 《计算机应用》唯一官方网站, 2022, 42(3): 688-694.
[12]	轩书婷, 刘惊雷. 基于离散哈希的聚类[J]. 《计算机应用》唯一官方网站, 2022, 42(3): 713-723.
[13]	李懿恒, 杜晨曦, 杨燕燕, 李翔宇. 基于伪标签一致度的不平衡数据特征选择算法[J]. 《计算机应用》唯一官方网站, 2022, 42(2): 475-484.
[14]	李颖之, 李曼, 董平, 周华春. 基于集成学习的多类型应用层DDoS攻击检测方法[J]. 《计算机应用》唯一官方网站, 2022, 42(12): 3775-3784.
[15]	葛倩, 张光斌, 张小凤. 基于最大信息系数的ReliefF和支持向量机交互的自动特征选择算法[J]. 《计算机应用》唯一官方网站, 2022, 42(10): 3046-3053.

融合特征选择的随机森林DDoS攻击检测

DDoS attack detection by random forest fused with feature selection

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 4

参考文献 22

相关文章 15

编辑推荐

Metrics