《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (12): 3775-3784.DOI: 10.11772/j.issn.1001-9081.2021091653
所属专题: 网络空间安全
收稿日期:2021-09-22
									
				
											修回日期:2022-01-14
									
				
											接受日期:2022-01-28
									
				
											发布日期:2022-12-21
									
				
											出版日期:2022-12-10
									
				
			通讯作者:
					李颖之
							作者简介:李曼(1997—),女,河南濮阳人,博士研究生,主要研究方向:网络安全、智能通信基金资助:
        
                                                                                                                            Yingzhi LI( ), Man LI, Ping DONG, Huachun ZHOU
), Man LI, Ping DONG, Huachun ZHOU
			  
			
			
			
                
        
    
Received:2021-09-22
									
				
											Revised:2022-01-14
									
				
											Accepted:2022-01-28
									
				
											Online:2022-12-21
									
				
											Published:2022-12-10
									
			Contact:
					Yingzhi LI   
							About author:LI Man, born in 1997, Ph. D. candidate. Her research interests include cyber security, intelligent communication.Supported by:摘要:
针对应用层分布式拒绝服务(DDoS)攻击类型多、难以同时检测的问题,提出了一种基于集成学习的应用层DDoS攻击检测方法,用于检测多类型的应用层DDoS攻击。首先,数据集生成模块模拟正常和攻击流量,筛选并提取对应的特征信息,并生成表征挑战黑洞(CC)、HTTP Flood、HTTP Post及HTTP Get攻击的47维特征信息;其次,离线训练模块将处理后的有效特征信息输入集成后的Stacking检测模型进行训练,从而得到可检测多类型应用层DDoS攻击的检测模型;最后,在线检测模块通过在线部署检测模型来判断待检测流量的具体流量类型。实验结果显示,与Bagging、Adaboost和XGBoost构建的分类模型相比,Stacking集成模型在准确率方面分别提高了0.18个百分点、0.21个百分点和0.19个百分点,且在最优时间窗口下的恶意流量检测率达到了98%。验证了所提方法对多类型应用层DDoS攻击检测的有效性。
中图分类号:
李颖之, 李曼, 董平, 周华春. 基于集成学习的多类型应用层DDoS攻击检测方法[J]. 计算机应用, 2022, 42(12): 3775-3784.
Yingzhi LI, Man LI, Ping DONG, Huachun ZHOU. Multi‑type application‑layer DDoS attack detection method based on integrated learning[J]. Journal of Computer Applications, 2022, 42(12): 3775-3784.
| 流量模型 | RF | XGBoost | ET | LightGBM | CNN | LSTM | 
|---|---|---|---|---|---|---|
| CC | 0.991 2 | 0.991 7 | 0.985 8 | 0.981 2 | 0.949 4 | 0.952 5 | 
| HTTP Flood | 0.910 7 | 0.950 4 | 0.901 7 | 0.912 4 | 0.686 3 | 0.761 0 | 
| HTTP Post | 0.874 6 | 0.911 1 | 0.856 0 | 0.876 7 | 0.328 4 | 0.641 2 | 
| HTTP Get | 0.824 7 | 0.873 4 | 0.811 1 | 0.837 4 | 0.775 1 | 0.651 1 | 
| Benign | 1.000 0 | 1.000 0 | 1.000 0 | 1.000 0 | 0.998 6 | 0.999 2 | 
| Other | 1.000 0 | 1.000 0 | 0.998 0 | 1.000 0 | 0.994 3 | 0.994 0 | 
表1 基分类器的召回率对比
Tab.1 Recall comparison of base classifiers
| 流量模型 | RF | XGBoost | ET | LightGBM | CNN | LSTM | 
|---|---|---|---|---|---|---|
| CC | 0.991 2 | 0.991 7 | 0.985 8 | 0.981 2 | 0.949 4 | 0.952 5 | 
| HTTP Flood | 0.910 7 | 0.950 4 | 0.901 7 | 0.912 4 | 0.686 3 | 0.761 0 | 
| HTTP Post | 0.874 6 | 0.911 1 | 0.856 0 | 0.876 7 | 0.328 4 | 0.641 2 | 
| HTTP Get | 0.824 7 | 0.873 4 | 0.811 1 | 0.837 4 | 0.775 1 | 0.651 1 | 
| Benign | 1.000 0 | 1.000 0 | 1.000 0 | 1.000 0 | 0.998 6 | 0.999 2 | 
| Other | 1.000 0 | 1.000 0 | 0.998 0 | 1.000 0 | 0.994 3 | 0.994 0 | 
| 预测类别 | 真实类别 | |
|---|---|---|
| 0 | 1 | |
| 0 | 真阴性(TN) | 假阴性(FN) | 
| 1 | 假阳性(FN) | 真阳性(TP) | 
表2 二分类混淆矩阵样例
Tab.2 Example of confusion matrix for binary classification
| 预测类别 | 真实类别 | |
|---|---|---|
| 0 | 1 | |
| 0 | 真阴性(TN) | 假阴性(FN) | 
| 1 | 假阳性(FN) | 真阳性(TP) | 
| 收集时间 | 源IP | 目的IP | 流量类型 | 
|---|---|---|---|
| 2021-05-22T15:31:00—15:56:00 | 23.1.0.1 | 23.1.1.1 | CC | 
| 23.1.0.7 | 23.1.1.1 | HTTP Flood | |
| 23.1.0.8 | 23.1.1.1 | HTTP Post | |
| 23.1.0.9 | 23.1.1.1 | HTTP Get | |
| 不间断 | 23.1.0.20~23.1.0.29 | 23.1.1.7 | Benign | 
| 2021-05-22T20:14:00—2021-05-23T16:15:00 | 23.1.0.1~23.1.0.13 | 23.1.1.2~23.1.1.6 | Other | 
表3 流量时间节点汇总
Tab.3 Flow time node summary
| 收集时间 | 源IP | 目的IP | 流量类型 | 
|---|---|---|---|
| 2021-05-22T15:31:00—15:56:00 | 23.1.0.1 | 23.1.1.1 | CC | 
| 23.1.0.7 | 23.1.1.1 | HTTP Flood | |
| 23.1.0.8 | 23.1.1.1 | HTTP Post | |
| 23.1.0.9 | 23.1.1.1 | HTTP Get | |
| 不间断 | 23.1.0.20~23.1.0.29 | 23.1.1.7 | Benign | 
| 2021-05-22T20:14:00—2021-05-23T16:15:00 | 23.1.0.1~23.1.0.13 | 23.1.1.2~23.1.1.6 | Other | 
| 流量类型 | 流量编号 | 流量类型所占比例/% | 流量类型具体数目 | 
|---|---|---|---|
| CC | 0 | 7.32 | 71 591 | 
| HTTP Flood | 1 | 1.66 | 16 200 | 
| HTTP Post | 2 | 1.57 | 15 317 | 
| HTTP Get | 3 | 1.69 | 16 546 | 
| Benign | 4 | 38.04 | 371 830 | 
| Other | 5 | 49.72 | 485 953 | 
表4 不同流量类型所占比例表
Tab.4 Percentages of different traffic types
| 流量类型 | 流量编号 | 流量类型所占比例/% | 流量类型具体数目 | 
|---|---|---|---|
| CC | 0 | 7.32 | 71 591 | 
| HTTP Flood | 1 | 1.66 | 16 200 | 
| HTTP Post | 2 | 1.57 | 15 317 | 
| HTTP Get | 3 | 1.69 | 16 546 | 
| Benign | 4 | 38.04 | 371 830 | 
| Other | 5 | 49.72 | 485 953 | 
| 特征 数量 | 流量类型 | 精准率 | 召回率 | F1分数 | 离线训练 时间/s | 在线检测 时间/s | 
|---|---|---|---|---|---|---|
| 47 | CC | 0.991 3 | 0.988 8 | 0.990 9 | 3 354.48 | 56.83 | 
| HTTP Flood | 0.961 8 | 0.973 0 | 0.9 674 | |||
| HTTP Post | 0.872 6 | 0.901 8 | 0.887 0 | |||
| HTTP Get | 0.923 5 | 0.876 0 | 0.899 1 | |||
| Benign | 1.000 0 | 1.000 0 | 1.000 0 | |||
| Other | 1.000 0 | 1.000 0 | 1.000 0 | |||
| 78 | CC | 0.967 6 | 0.947 6 | 0.957 5 | 4 372.74 | 93.41 | 
| HTTP Flood | 0.824 7 | 0.928 4 | 0.873 5 | |||
| HTTP Post | 0.872 7 | 0.900 7 | 0.886 5 | |||
| HTTP Get | 0.918 6 | 0.871 4 | 0.894 4 | |||
| Benign | 1.000 0 | 1.000 0 | 1.000 0 | |||
| Other | 1.000 0 | 0.994 7 | 0.997 3 | 
表5 不同特征训练模型的指标对比
Tab.5 Index comparison of different feature training models
| 特征 数量 | 流量类型 | 精准率 | 召回率 | F1分数 | 离线训练 时间/s | 在线检测 时间/s | 
|---|---|---|---|---|---|---|
| 47 | CC | 0.991 3 | 0.988 8 | 0.990 9 | 3 354.48 | 56.83 | 
| HTTP Flood | 0.961 8 | 0.973 0 | 0.9 674 | |||
| HTTP Post | 0.872 6 | 0.901 8 | 0.887 0 | |||
| HTTP Get | 0.923 5 | 0.876 0 | 0.899 1 | |||
| Benign | 1.000 0 | 1.000 0 | 1.000 0 | |||
| Other | 1.000 0 | 1.000 0 | 1.000 0 | |||
| 78 | CC | 0.967 6 | 0.947 6 | 0.957 5 | 4 372.74 | 93.41 | 
| HTTP Flood | 0.824 7 | 0.928 4 | 0.873 5 | |||
| HTTP Post | 0.872 7 | 0.900 7 | 0.886 5 | |||
| HTTP Get | 0.918 6 | 0.871 4 | 0.894 4 | |||
| Benign | 1.000 0 | 1.000 0 | 1.000 0 | |||
| Other | 1.000 0 | 0.994 7 | 0.997 3 | 
| 模型 | 准确率 | 宏平均精确率 | 宏平均召回率 | 宏平均F1分数 | 
|---|---|---|---|---|
| Bagging | 0.992 5 | 0.942 3 | 0.938 0 | 0.939 7 | 
| AdaBoost | 0.992 2 | 0.934 8 | 0.931 8 | 0.933 1 | 
| XGBoost | 0.992 4 | 0.933 8 | 0.947 6 | 0.940 5 | 
| Stacking | 0.994 3 | 0.958 2 | 0.956 6 | 0.957 4 | 
表6 Stacking模型与其他集成策略模型性能对比
Tab.6 Performance comparison between Stacking model and other integration strategy models
| 模型 | 准确率 | 宏平均精确率 | 宏平均召回率 | 宏平均F1分数 | 
|---|---|---|---|---|
| Bagging | 0.992 5 | 0.942 3 | 0.938 0 | 0.939 7 | 
| AdaBoost | 0.992 2 | 0.934 8 | 0.931 8 | 0.933 1 | 
| XGBoost | 0.992 4 | 0.933 8 | 0.947 6 | 0.940 5 | 
| Stacking | 0.994 3 | 0.958 2 | 0.956 6 | 0.957 4 | 
| 时间窗口值/min | 恶意流量检测率/% | 
|---|---|
| 1 | 97.88 | 
| 2 | 98.01 | 
| 3 | 97.14 | 
表7 恶意流量检测率对比
Tab.7 Malicious traffic detection rate comparison
| 时间窗口值/min | 恶意流量检测率/% | 
|---|---|
| 1 | 97.88 | 
| 2 | 98.01 | 
| 3 | 97.14 | 
| 攻击种类 | 精准率 | 召回率 | F1分数 | 
|---|---|---|---|
| Benign | 0.999 7 | 0.999 8 | 0.999 9 | 
| CC | 0.988 5 | 0.984 9 | 0.986 7 | 
| HTTP Flood | 0.889 8 | 0.977 8 | 0.931 7 | 
| HTTP Get | 0.929 2 | 0.877 8 | 0.902 8 | 
| HTTP Post | 0.865 3 | 0.914 4 | 0.889 2 | 
| Other | 0.996 8 | 0.991 5 | 0.994 1 | 
表8 最优时间窗口的检测性能
Tab.8 Test performance for optimal time window
| 攻击种类 | 精准率 | 召回率 | F1分数 | 
|---|---|---|---|
| Benign | 0.999 7 | 0.999 8 | 0.999 9 | 
| CC | 0.988 5 | 0.984 9 | 0.986 7 | 
| HTTP Flood | 0.889 8 | 0.977 8 | 0.931 7 | 
| HTTP Get | 0.929 2 | 0.877 8 | 0.902 8 | 
| HTTP Post | 0.865 3 | 0.914 4 | 0.889 2 | 
| Other | 0.996 8 | 0.991 5 | 0.994 1 | 
| 1 | 绿盟科技,中国电信云堤. 2020DDoS攻击态势报告[R/OL]. (2021-01-21) [2022-01-11].. 10.26524/royal.61 | 
| NSFOCUS. China Telecom DamDDoS. 2020 DDoS attack situation report[R/OL]. (2021-01-21) [2022-01-11].. 10.26524/royal.61 | |
| 2 | ZHANG B, LIU Z H, DONG S Q. IAP‑based self‑learning real‑ time application layer DDoS detection method on storm platform[C]// Proceedings of the 2019 IEEE International Conference on Parallel and Distributed Processing with Applications, Big Data and Cloud Computing, Sustainable Computing and Communications, Social Computing and Networking. Piscataway: IEEE, 2019: 912-919. 10.1109/ispa-bdcloud-sustaincom-socialcom48970.2019.00133 | 
| 3 | JING X Y, YAN Z, PEDRYCZ W. Security data collection and data analytics in the Internet: a survey[J]. IEEE Communications Surveys and Tutorials, 2019, 21(1): 586-618. 10.1109/comst.2018.2863942 | 
| 4 | ERHAN D, ANARIM E. Istatistiksel yöntemler ile DDoS saldiri tespiti DDoS detection using statistical methods[C]// Proceedings of the 28th Signal Processing and Communications Applications Conference. Piscataway: IEEE, 2020: 1-4. 10.1109/siu49456.2020.9302487 | 
| 5 | WANG C, MIU T T N, LUO X, et al. SkyShield: a sketch‑ based defense system against application layer DDoS attacks[J]. IEEE Transactions on Information Forensics and Security, 2018, 13(3): 559-573. 10.1109/tifs.2017.2758754 | 
| 6 | TANG J, CHENG Y, HAO Y, et al. SIP flooding attack detection with a multi‑dimensional sketch design[J]. IEEE Transactions on Dependable and Secure Computing, 2014, 11(6):582-595. 10.1109/tdsc.2014.2302298 | 
| 7 | 张蕾,崔勇,刘静,等. 机器学习在网络空间安全研究中的应用[J]. 计算机学报, 2018, 41(9):1943-1975. 10.11897/SP.J.1016.2018.01943 | 
| ZHANG L, CUI Y, LIU J, et al. Application of machine learning in cyberspace security research[J]. Chinese Journal of Computers, 2018, 41(9): 1943-1975. 10.11897/SP.J.1016.2018.01943 | |
| 8 | SHE C Y, WEN W S, ZHENG K S, et al. Application layer DDoS detection by K‑means algorithm[C]// Proceedings of the 4th International Conference on Electrical and Electronics Engineering and Computer Science. Dordrecht: Atlantis Press, 2016: 75-78. 10.2991/iceeecs-16.2016.16 | 
| 9 | JOHNSON SINGH K, THONGAM K, DE T. Entropy‑based application layer DDoS attack detection using artificial neural networks[J]. Entropy, 2016, 18(10): No.350. 10.3390/e18100350 | 
| 10 | ADI E, BAIG Z, HINGSTON P. Stealthy Denial of Service (DoS) attack modelling and detection for HTTP/2 services[J]. Journal of Network and Computer Applications, 2017, 91:1-13. 10.1016/j.jnca.2017.04.015 | 
| 11 | 顾玥,李丹,高凯辉. 基于机器学习和深度学习的网络流量分类研究[J]. 电信科学, 2021, 37(3): 105-113. | 
| GU Y, LI D, GAO K H. Research on network traffic classification based on machine learning and deep learning[J]. Telecommunications Science, 2021, 37(3): 105-113. | |
| 12 | LOTFOLLAHI M, JAFARI SIAVOSHANI M, SHIRALI HOSSEIN ZADE R, et al. Deep packet: a novel approach for encrypted traffic classification using deep learning[J]. Soft Computing, 2020, 24(3): 1999-2012. 10.1007/s00500-019-04030-2 | 
| 13 | WANG W, ZHU M, WANG J L, et al. End‑to‑end encrypted traffic classification with one‑dimensional convolution neural networks[C]// Proceedings of the 2017 IEEE International Conference on Intelligence and Security Informatics. Piscataway: IEEE, 2017: 43-48. 10.1109/isi.2017.8004872 | 
| 14 | 周志华. 集成学习:基础与算法[M]. 李楠,译. 北京:电子工业出版社, 2020:21-60. | 
| ZHOU Z H. Ensemble Methods: Foundations and Algorithms[M]. LI N, translated. Beijing: Publishing House of Electronics Industry, 2020:21-60. | |
| 15 | SHARAFALDIN I, HABIBI LASHKARI A, GHORBANI A A. Toward generating a new intrusion detection dataset and intrusion traffic characterization[C]// Proceedings of the 4th International Conference on Information Systems Security and Privacy. Setúbal: SciTePress, 2018: 108-116. 10.5220/0006639801080116 | 
| 16 | LASHKARI A H. CICFlowMeter[CP/OL]. [2022-01-12].. 10.1149/ma2022-0115mtgabs | 
| 17 | 周志华. 机器学习[M]. 北京: 清华大学出版社, 2016:247-267. | 
| ZHOU Z H. Machine Learning[M]. Beijing: Tsinghua University Press, 2016:247-267. | |
| 18 | PAVLYSHENKO B. Using stacking approaches for machine learning models[C]// Proceedings of the IEEE 2nd International Conference on Data Stream Mining and Processing. Piscataway: IEEE, 2018:255-258. 10.1109/dsmp.2018.8478522 | 
| 19 | BREIMAN L. Random forests[J]. Machine Learning, 2001, 45(1): 5-32. 10.1023/a:1010933404324 | 
| 20 | CHEN T Q, GUESTRIN C. XGBoost: a scalable tree boosting system[C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2016: 785-794. 10.1145/2939672.2939785 | 
| 21 | ROSKA T, CHUA L O. The CNN universal machine: an analogic array computer[J]. IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, 1993, 40(3): 163-173. 10.1109/82.222815 | 
| 22 | 王子恒. 基于区块链的海量连接管理架构设计与实现[D]. 北京:北京交通大学, 2021:40-52. 10.53469/jissr.2021.08(12).29 | 
| WANG Z H. Design and implementation of mass connection management architecture based on blockchain[D]. Beijing: Beijing Jiaotong University, 2021: 40-52. 10.53469/jissr.2021.08(12).29 | 
| [1] | 陈学斌, 任志强, 张宏扬. 联邦学习中的安全威胁与防御措施综述[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1663-1672. | 
| [2] | 姚梓豪, 栗远明, 马自强, 李扬, 魏良根. 基于机器学习的多目标缓存侧信道攻击检测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1862-1871. | 
| [3] | 佘维, 李阳, 钟李红, 孔德锋, 田钊. 基于改进实数编码遗传算法的神经网络超参数优化[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 671-676. | 
| [4] | 郑毅, 廖存燚, 张天倩, 王骥, 刘守印. 面向城区的基于图去噪的小区级RSRP估计方法[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 855-862. | 
| [5] | 李博, 黄建强, 黄东强, 王晓英. 基于异构平台的稀疏矩阵向量乘自适应计算优化[J]. 《计算机应用》唯一官方网站, 2024, 44(12): 3867-3875. | 
| [6] | 陈学斌, 屈昌盛. 面向联邦学习的后门攻击与防御综述[J]. 《计算机应用》唯一官方网站, 2024, 44(11): 3459-3469. | 
| [7] | 孙仁科, 皇甫志宇, 陈虎, 李仲年, 许新征. 神经架构搜索综述[J]. 《计算机应用》唯一官方网站, 2024, 44(10): 2983-2994. | 
| [8] | 柴汶泽, 范菁, 孙书魁, 梁一鸣, 刘竟锋. 深度度量学习综述[J]. 《计算机应用》唯一官方网站, 2024, 44(10): 2995-3010. | 
| [9] | 尹春勇, 周永成. 双端聚类的自动调整聚类联邦学习[J]. 《计算机应用》唯一官方网站, 2024, 44(10): 3011-3020. | 
| [10] | 龙杰, 谢良, 徐海蛟. 集成的深度强化学习投资组合模型[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 300-310. | 
| [11] | 崔昊阳, 张晖, 周雷, 杨春明, 李波, 赵旭剑. 有序规范实数对多相似度K最近邻分类算法[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2673-2678. | 
| [12] | 葛晨洋, 刘勤让, 裴雪, 魏帅, 朱正彬. 软件定义网络中高效协同防御分布式拒绝服务攻击的方案[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2477-2485. | 
| [13] | 钟静, 林晨, 盛志伟, 张仕斌. 基于汉明距离的量子K-Means算法[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2493-2498. | 
| [14] | 蓝梦婕, 蔡剑平, 孙岚. 非独立同分布数据下的自正则化联邦学习优化方法[J]. 《计算机应用》唯一官方网站, 2023, 43(7): 2073-2081. | 
| [15] | 黄晓辉, 杨凯铭, 凌嘉壕. 基于共享注意力的多智能体强化学习订单派送[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1620-1624. | 
| 阅读次数 | ||||||
| 全文 |  | |||||
| 摘要 |  | |||||