基于集成学习的多类型应用层DDoS攻击检测方法

doi:10.11772/j.issn.1001-9081.2021091653

《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (12): 3775-3784.DOI: 10.11772/j.issn.1001-9081.2021091653

• 网络空间安全 • 上一篇

基于集成学习的多类型应用层DDoS攻击检测方法

李颖之(), 李曼, 董平, 周华春

北京交通大学电子信息工程学院，北京 100044

收稿日期:2021-09-22 修回日期:2022-01-14 接受日期:2022-01-28 发布日期:2022-12-21 出版日期:2022-12-10
通讯作者: 李颖之
作者简介:李曼（1997—），女，河南濮阳人，博士研究生，主要研究方向：网络安全、智能通信
董平（1979—），男，北京人，教授，博士，主要研究方向：新一代互联网、智慧车联网、移动互联网
周华春（1965—），男，北京人，教授，博士，主要研究方向：智能通信、移动互联网、网络安全、卫星网络。
基金资助:
国家重点研发计划项目(2018YFA0701604)

Multi‑type application‑layer DDoS attack detection method based on integrated learning

Yingzhi LI(), Man LI, Ping DONG, Huachun ZHOU

College of Electronic Information Engineering，Beijing Jiaotong University，Beijing 100044，China

Received:2021-09-22 Revised:2022-01-14 Accepted:2022-01-28 Online:2022-12-21 Published:2022-12-10
Contact: Yingzhi LI
About author:LI Man， born in 1997， Ph. D. candidate. Her research interests include cyber security， intelligent communication.
DONG Ping， born in 1979， Ph. D.， professor. His research interests include next generation Internet， smart Internet of vehicles，mobile Internet.
ZHOU Huachun， born in 1965， Ph. D.， professor. His research interests include intelligent communication， mobile Internet， network security， satellite network.
Supported by:
National Key Research and Development Program of China(2018YFA0701604)

摘要/Abstract

摘要：

针对应用层分布式拒绝服务（DDoS）攻击类型多、难以同时检测的问题，提出了一种基于集成学习的应用层DDoS攻击检测方法，用于检测多类型的应用层DDoS攻击。首先，数据集生成模块模拟正常和攻击流量，筛选并提取对应的特征信息，并生成表征挑战黑洞（CC）、HTTP Flood、HTTP Post及HTTP Get攻击的47维特征信息；其次，离线训练模块将处理后的有效特征信息输入集成后的Stacking检测模型进行训练，从而得到可检测多类型应用层DDoS攻击的检测模型；最后，在线检测模块通过在线部署检测模型来判断待检测流量的具体流量类型。实验结果显示，与Bagging、Adaboost和XGBoost构建的分类模型相比，Stacking集成模型在准确率方面分别提高了0.18个百分点、0.21个百分点和0.19个百分点，且在最优时间窗口下的恶意流量检测率达到了98%。验证了所提方法对多类型应用层DDoS攻击检测的有效性。

关键词: 多类型, 应用层分布式拒绝服务攻击, 分布式拒绝服务, 机器学习, 集成学习

Abstract:

Aiming at the problem of multiple types of application?layer Distributed Denial of Service （DDoS） attacks， which are difficult to detect simultaneously， an application?layer DDoS attack detection method based on integrated learning was proposed to detect multiple types of application?layer DDoS attacks. Firstly， by using the dataset generation module， the normal and attack traffic was simulated， the corresponding feature information was filtered and extracted， and 47?dimensional feature information characterized Challenge Collapsar （CC）， HTTP Flood， HTTP Post and HTTP Get attacks were generated. Secondly， by using the offline training module， the effective features were processed and input into the integrated Stacking detection model for training， thereby obtaining a detection model that can detect multiple types of application?layer DDoS attacks. Finally， by using the online detection module， the specific traffic type of the traffic to be detected was judged through deploying the detection model online. Experimental results show that compared with the classification models constructed by Bagging，Adaboost and XGBoost，the Stacking integretion model improves the accuracy by 0. 18 percentage points，0. 21 percentage points and 0. 19 percentage points respectively，and has the malicious traffic detection rate reached 98% under the optimal time window. It can be seen that the proposed method has good performance in detecting multi-type application-layer DDoS attacks.

Key words: multi?type, application?layer Distributed Denial of Service (DDoS) attack, Distributed Denial of Service (DDoS), machine learning, integrated learning

中图分类号:

TP393

李颖之, 李曼, 董平, 周华春. 基于集成学习的多类型应用层DDoS攻击检测方法[J]. 计算机应用, 2022, 42(12): 3775-3784.

Yingzhi LI, Man LI, Ping DONG, Huachun ZHOU. Multi‑type application‑layer DDoS attack detection method based on integrated learning[J]. Journal of Computer Applications, 2022, 42(12): 3775-3784.

图/表 17

参考文献 22

1	绿盟科技，中国电信云堤. 2020DDoS攻击态势报告［R/OL］. （2021-01-21）［2022-01-11］.. 10.26524/royal.61
	NSFOCUS. China Telecom DamDDoS. 2020 DDoS attack situation report［R/OL］. （2021-01-21）［2022-01-11］.. 10.26524/royal.61
2	ZHANG B， LIU Z H， DONG S Q. IAP‑based self‑learning real‑ time application layer DDoS detection method on storm platform［C］// Proceedings of the 2019 IEEE International Conference on Parallel and Distributed Processing with Applications， Big Data and Cloud Computing， Sustainable Computing and Communications， Social Computing and Networking. Piscataway： IEEE， 2019： 912-919. 10.1109/ispa-bdcloud-sustaincom-socialcom48970.2019.00133
3	JING X Y， YAN Z， PEDRYCZ W. Security data collection and data analytics in the Internet： a survey［J］. IEEE Communications Surveys and Tutorials， 2019， 21（1）： 586-618. 10.1109/comst.2018.2863942
4	ERHAN D， ANARIM E. Istatistiksel yöntemler ile DDoS saldiri tespiti DDoS detection using statistical methods［C］// Proceedings of the 28th Signal Processing and Communications Applications Conference. Piscataway： IEEE， 2020： 1-4. 10.1109/siu49456.2020.9302487
5	WANG C， MIU T T N， LUO X， et al. SkyShield： a sketch‑ based defense system against application layer DDoS attacks［J］. IEEE Transactions on Information Forensics and Security， 2018， 13（3）： 559-573. 10.1109/tifs.2017.2758754
6	TANG J， CHENG Y， HAO Y， et al. SIP flooding attack detection with a multi‑dimensional sketch design［J］. IEEE Transactions on Dependable and Secure Computing， 2014， 11（6）：582-595. 10.1109/tdsc.2014.2302298
7	张蕾，崔勇，刘静，等. 机器学习在网络空间安全研究中的应用［J］. 计算机学报， 2018， 41（9）：1943-1975. 10.11897/SP.J.1016.2018.01943
	ZHANG L， CUI Y， LIU J， et al. Application of machine learning in cyberspace security research［J］. Chinese Journal of Computers， 2018， 41（9）： 1943-1975. 10.11897/SP.J.1016.2018.01943
8	SHE C Y， WEN W S， ZHENG K S， et al. Application layer DDoS detection by K‑means algorithm［C］// Proceedings of the 4th International Conference on Electrical and Electronics Engineering and Computer Science. Dordrecht： Atlantis Press， 2016： 75-78. 10.2991/iceeecs-16.2016.16
9	JOHNSON SINGH K， THONGAM K， DE T. Entropy‑based application layer DDoS attack detection using artificial neural networks［J］. Entropy， 2016， 18（10）： No.350. 10.3390/e18100350
10	ADI E， BAIG Z， HINGSTON P. Stealthy Denial of Service （DoS） attack modelling and detection for HTTP/2 services［J］. Journal of Network and Computer Applications， 2017， 91：1-13. 10.1016/j.jnca.2017.04.015
11	顾玥，李丹，高凯辉. 基于机器学习和深度学习的网络流量分类研究［J］. 电信科学， 2021， 37（3）： 105-113.
	GU Y， LI D， GAO K H. Research on network traffic classification based on machine learning and deep learning［J］. Telecommunications Science， 2021， 37（3）： 105-113.
12	LOTFOLLAHI M， JAFARI SIAVOSHANI M， SHIRALI HOSSEIN ZADE R， et al. Deep packet： a novel approach for encrypted traffic classification using deep learning［J］. Soft Computing， 2020， 24（3）： 1999-2012. 10.1007/s00500-019-04030-2
13	WANG W， ZHU M， WANG J L， et al. End‑to‑end encrypted traffic classification with one‑dimensional convolution neural networks［C］// Proceedings of the 2017 IEEE International Conference on Intelligence and Security Informatics. Piscataway： IEEE， 2017： 43-48. 10.1109/isi.2017.8004872
14	周志华. 集成学习：基础与算法［M］. 李楠，译. 北京：电子工业出版社， 2020：21-60.
	ZHOU Z H. Ensemble Methods： Foundations and Algorithms［M］. LI N， translated. Beijing： Publishing House of Electronics Industry， 2020：21-60.
15	SHARAFALDIN I， HABIBI LASHKARI A， GHORBANI A A. Toward generating a new intrusion detection dataset and intrusion traffic characterization［C］// Proceedings of the 4th International Conference on Information Systems Security and Privacy. Setúbal： SciTePress， 2018： 108-116. 10.5220/0006639801080116
16	LASHKARI A H. CICFlowMeter［CP/OL］. ［2022-01-12］.. 10.1149/ma2022-0115mtgabs
17	周志华. 机器学习［M］. 北京：清华大学出版社， 2016：247-267.
	ZHOU Z H. Machine Learning［M］. Beijing： Tsinghua University Press， 2016：247-267.
18	PAVLYSHENKO B. Using stacking approaches for machine learning models［C］// Proceedings of the IEEE 2nd International Conference on Data Stream Mining and Processing. Piscataway： IEEE， 2018：255-258. 10.1109/dsmp.2018.8478522
19	BREIMAN L. Random forests［J］. Machine Learning， 2001， 45（1）： 5-32. 10.1023/a:1010933404324
20	CHEN T Q， GUESTRIN C. XGBoost： a scalable tree boosting system［C］// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York： ACM， 2016： 785-794. 10.1145/2939672.2939785
21	ROSKA T， CHUA L O. The CNN universal machine： an analogic array computer［J］. IEEE Transactions on Circuits and Systems II： Analog and Digital Signal Processing， 1993， 40（3）： 163-173. 10.1109/82.222815
22	王子恒. 基于区块链的海量连接管理架构设计与实现［D］. 北京：北京交通大学， 2021：40-52. 10.53469/jissr.2021.08(12).29
	WANG Z H. Design and implementation of mass connection management architecture based on blockchain［D］. Beijing： Beijing Jiaotong University， 2021： 40-52. 10.53469/jissr.2021.08(12).29

流量模型	RF	XGBoost	ET	LightGBM	CNN	LSTM
CC	0.991 2	0.991 7	0.985 8	0.981 2	0.949 4	0.952 5
HTTP Flood	0.910 7	0.950 4	0.901 7	0.912 4	0.686 3	0.761 0
HTTP Post	0.874 6	0.911 1	0.856 0	0.876 7	0.328 4	0.641 2
HTTP Get	0.824 7	0.873 4	0.811 1	0.837 4	0.775 1	0.651 1
Benign	1.000 0	1.000 0	1.000 0	1.000 0	0.998 6	0.999 2
Other	1.000 0	1.000 0	0.998 0	1.000 0	0.994 3	0.994 0

流量模型	RF	XGBoost	ET	LightGBM	CNN	LSTM
CC	0.991 2	0.991 7	0.985 8	0.981 2	0.949 4	0.952 5
HTTP Flood	0.910 7	0.950 4	0.901 7	0.912 4	0.686 3	0.761 0
HTTP Post	0.874 6	0.911 1	0.856 0	0.876 7	0.328 4	0.641 2
HTTP Get	0.824 7	0.873 4	0.811 1	0.837 4	0.775 1	0.651 1
Benign	1.000 0	1.000 0	1.000 0	1.000 0	0.998 6	0.999 2
Other	1.000 0	1.000 0	0.998 0	1.000 0	0.994 3	0.994 0

预测类别	真实类别
预测类别	0	1
0	真阴性（TN）	假阴性（FN）
1	假阳性（FN）	真阳性（TP）

预测类别	真实类别
预测类别	0	1
0	真阴性（TN）	假阴性（FN）
1	假阳性（FN）	真阳性（TP）

收集时间	源IP	目的IP	流量类型
2021-05-22T15：31：00—15：56：00	23.1.0.1	23.1.1.1	CC
	23.1.0.7	23.1.1.1	HTTP Flood
	23.1.0.8	23.1.1.1	HTTP Post
	23.1.0.9	23.1.1.1	HTTP Get
不间断	23.1.0.20~23.1.0.29	23.1.1.7	Benign
2021-05-22T20：14：00—2021-05-23T16：15：00	23.1.0.1~23.1.0.13	23.1.1.2~23.1.1.6	Other

基于集成学习的多类型应用层DDoS攻击检测方法

Multi‑type application‑layer DDoS attack detection method based on integrated learning

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 17

参考文献 22

相关文章 15

编辑推荐

Metrics

流量类型	流量编号	流量类型所占比例/%	流量类型具体数目
CC	0	7.32	71 591
HTTP Flood	1	1.66	16 200
HTTP Post	2	1.57	15 317
HTTP Get	3	1.69	16 546
Benign	4	38.04	371 830
Other	5	49.72	485 953

特征数量	流量类型	精准率	召回率	F1分数	离线训练时间/s	在线检测时间/s
47	CC	0.991 3	0.988 8	0.990 9	3 354.48	56.83
	HTTP Flood	0.961 8	0.973 0	0.9 674
	HTTP Post	0.872 6	0.901 8	0.887 0
	HTTP Get	0.923 5	0.876 0	0.899 1
	Benign	1.000 0	1.000 0	1.000 0
	Other	1.000 0	1.000 0	1.000 0
78	CC	0.967 6	0.947 6	0.957 5	4 372.74	93.41
	HTTP Flood	0.824 7	0.928 4	0.873 5
	HTTP Post	0.872 7	0.900 7	0.886 5
	HTTP Get	0.918 6	0.871 4	0.894 4
	Benign	1.000 0	1.000 0	1.000 0
	Other	1.000 0	0.994 7	0.997 3

模型	准确率	宏平均精确率	宏平均召回率	宏平均F1分数
Bagging	0.992 5	0.942 3	0.938 0	0.939 7
AdaBoost	0.992 2	0.934 8	0.931 8	0.933 1
XGBoost	0.992 4	0.933 8	0.947 6	0.940 5
Stacking	0.994 3	0.958 2	0.956 6	0.957 4

攻击种类	精准率	召回率	F1分数
Benign	0.999 7	0.999 8	0.999 9
CC	0.988 5	0.984 9	0.986 7
HTTP Flood	0.889 8	0.977 8	0.931 7
HTTP Get	0.929 2	0.877 8	0.902 8
HTTP Post	0.865 3	0.914 4	0.889 2
Other	0.996 8	0.991 5	0.994 1

[1]	蔡淳豪, 李建良. 小样本问题下培训弱教师网络的模型蒸馏模型[J]. 《计算机应用》唯一官方网站, 2022, 42(9): 2652-2658.
[2]	郭一阳, 于炯, 杜旭升, 杨少智, 曹铭. 基于自编码器与集成学习的离群点检测算法[J]. 《计算机应用》唯一官方网站, 2022, 42(7): 2078-2087.
[3]	李洪亮, 张弄, 孙婷, 李想. 分布式机器学习作业性能干扰分析与预测[J]. 《计算机应用》唯一官方网站, 2022, 42(6): 1649-1655.
[4]	包永春, 张建臣, 杜守信, 张军军. 基于非负矩阵分解与稀疏表示的多标签分类算法[J]. 《计算机应用》唯一官方网站, 2022, 42(5): 1375-1382.
[5]	李晓寒, 贾华丁, 程雪, 李太勇. 基于改进遗传算法和图神经网络的股市波动预测方法[J]. 《计算机应用》唯一官方网站, 2022, 42(5): 1624-1633.
[6]	王颖洁, 朱久祺, 汪祖民, 白凤波, 弓箭. 自然语言处理在文本情感分析领域应用综述[J]. 《计算机应用》唯一官方网站, 2022, 42(4): 1011-1020.
[7]	陈浩杰, 范江亭, 刘勇. 深度强化学习解决动态旅行商问题[J]. 《计算机应用》唯一官方网站, 2022, 42(4): 1194-1200.
[8]	许仁杰, 刘宝弟, 张凯, 刘伟锋. 基于贝叶斯权函数的模型无关元学习算法[J]. 《计算机应用》唯一官方网站, 2022, 42(3): 708-712.
[9]	刘海杨, 孟令航, 林仲航, 谷源涛. 基于轨迹点聚类的航路发现方法[J]. 《计算机应用》唯一官方网站, 2022, 42(3): 890-894.
[10]	陈露, 张晓霞, 于洪. 基于先验知识的非负矩阵半可解释三因子分解算法[J]. 《计算机应用》唯一官方网站, 2022, 42(3): 671-675.
[11]	谢鑫, 张贤勇, 王旋晔, 唐鹏飞. 变精度邻域等价粒的邻域决策树构造算法[J]. 《计算机应用》唯一官方网站, 2022, 42(2): 382-388.
[12]	李蒙蒙, 刘艺, 李庚松, 郑奇斌, 秦伟, 任小广. 不平衡多分类算法综述[J]. 《计算机应用》唯一官方网站, 2022, 42(11): 3307-3321.
[13]	汪烨, 周澳回, 周思源, 姜波, 陈骏武, 宋师哲. 智能计算服务的需求获取方法[J]. 《计算机应用》唯一官方网站, 2022, 42(11): 3486-3492.
[14]	李小娟, 韩萌, 王乐, 张妮, 程浩东. 基于准确率爬坡的动态加权集成分类算法[J]. 《计算机应用》唯一官方网站, 2022, 42(1): 123-131.
[15]	郭棉, 张锦友. 移动边缘计算环境中面向机器学习的计算迁移策略[J]. 计算机应用, 2021, 41(9): 2639-2645.

时间窗口值/min	恶意流量检测率/%
1	97.88
2	98.01
3	97.14

时间窗口值/min	恶意流量检测率/%
1	97.88
2	98.01
3	97.14