《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (12): 3775-3784.DOI: 10.11772/j.issn.1001-9081.2021091653
• 网络空间安全 • 上一篇
收稿日期:
2021-09-22
修回日期:
2022-01-14
接受日期:
2022-01-28
发布日期:
2022-12-21
出版日期:
2022-12-10
通讯作者:
李颖之
作者简介:
李曼(1997—),女,河南濮阳人,博士研究生,主要研究方向:网络安全、智能通信基金资助:
Yingzhi LI(), Man LI, Ping DONG, Huachun ZHOU
Received:
2021-09-22
Revised:
2022-01-14
Accepted:
2022-01-28
Online:
2022-12-21
Published:
2022-12-10
Contact:
Yingzhi LI
About author:
LI Man, born in 1997, Ph. D. candidate. Her research interests include cyber security, intelligent communication.Supported by:
摘要:
针对应用层分布式拒绝服务(DDoS)攻击类型多、难以同时检测的问题,提出了一种基于集成学习的应用层DDoS攻击检测方法,用于检测多类型的应用层DDoS攻击。首先,数据集生成模块模拟正常和攻击流量,筛选并提取对应的特征信息,并生成表征挑战黑洞(CC)、HTTP Flood、HTTP Post及HTTP Get攻击的47维特征信息;其次,离线训练模块将处理后的有效特征信息输入集成后的Stacking检测模型进行训练,从而得到可检测多类型应用层DDoS攻击的检测模型;最后,在线检测模块通过在线部署检测模型来判断待检测流量的具体流量类型。实验结果显示,与Bagging、Adaboost和XGBoost构建的分类模型相比,Stacking集成模型在准确率方面分别提高了0.18个百分点、0.21个百分点和0.19个百分点,且在最优时间窗口下的恶意流量检测率达到了98%。验证了所提方法对多类型应用层DDoS攻击检测的有效性。
中图分类号:
李颖之, 李曼, 董平, 周华春. 基于集成学习的多类型应用层DDoS攻击检测方法[J]. 计算机应用, 2022, 42(12): 3775-3784.
Yingzhi LI, Man LI, Ping DONG, Huachun ZHOU. Multi‑type application‑layer DDoS attack detection method based on integrated learning[J]. Journal of Computer Applications, 2022, 42(12): 3775-3784.
流量模型 | RF | XGBoost | ET | LightGBM | CNN | LSTM |
---|---|---|---|---|---|---|
CC | 0.991 2 | 0.991 7 | 0.985 8 | 0.981 2 | 0.949 4 | 0.952 5 |
HTTP Flood | 0.910 7 | 0.950 4 | 0.901 7 | 0.912 4 | 0.686 3 | 0.761 0 |
HTTP Post | 0.874 6 | 0.911 1 | 0.856 0 | 0.876 7 | 0.328 4 | 0.641 2 |
HTTP Get | 0.824 7 | 0.873 4 | 0.811 1 | 0.837 4 | 0.775 1 | 0.651 1 |
Benign | 1.000 0 | 1.000 0 | 1.000 0 | 1.000 0 | 0.998 6 | 0.999 2 |
Other | 1.000 0 | 1.000 0 | 0.998 0 | 1.000 0 | 0.994 3 | 0.994 0 |
表1 基分类器的召回率对比
Tab.1 Recall comparison of base classifiers
流量模型 | RF | XGBoost | ET | LightGBM | CNN | LSTM |
---|---|---|---|---|---|---|
CC | 0.991 2 | 0.991 7 | 0.985 8 | 0.981 2 | 0.949 4 | 0.952 5 |
HTTP Flood | 0.910 7 | 0.950 4 | 0.901 7 | 0.912 4 | 0.686 3 | 0.761 0 |
HTTP Post | 0.874 6 | 0.911 1 | 0.856 0 | 0.876 7 | 0.328 4 | 0.641 2 |
HTTP Get | 0.824 7 | 0.873 4 | 0.811 1 | 0.837 4 | 0.775 1 | 0.651 1 |
Benign | 1.000 0 | 1.000 0 | 1.000 0 | 1.000 0 | 0.998 6 | 0.999 2 |
Other | 1.000 0 | 1.000 0 | 0.998 0 | 1.000 0 | 0.994 3 | 0.994 0 |
预测类别 | 真实类别 | |
---|---|---|
0 | 1 | |
0 | 真阴性(TN) | 假阴性(FN) |
1 | 假阳性(FN) | 真阳性(TP) |
表2 二分类混淆矩阵样例
Tab.2 Example of confusion matrix for binary classification
预测类别 | 真实类别 | |
---|---|---|
0 | 1 | |
0 | 真阴性(TN) | 假阴性(FN) |
1 | 假阳性(FN) | 真阳性(TP) |
收集时间 | 源IP | 目的IP | 流量类型 |
---|---|---|---|
2021-05-22T15:31:00—15:56:00 | 23.1.0.1 | 23.1.1.1 | CC |
23.1.0.7 | 23.1.1.1 | HTTP Flood | |
23.1.0.8 | 23.1.1.1 | HTTP Post | |
23.1.0.9 | 23.1.1.1 | HTTP Get | |
不间断 | 23.1.0.20~23.1.0.29 | 23.1.1.7 | Benign |
2021-05-22T20:14:00—2021-05-23T16:15:00 | 23.1.0.1~23.1.0.13 | 23.1.1.2~23.1.1.6 | Other |
表3 流量时间节点汇总
Tab.3 Flow time node summary
收集时间 | 源IP | 目的IP | 流量类型 |
---|---|---|---|
2021-05-22T15:31:00—15:56:00 | 23.1.0.1 | 23.1.1.1 | CC |
23.1.0.7 | 23.1.1.1 | HTTP Flood | |
23.1.0.8 | 23.1.1.1 | HTTP Post | |
23.1.0.9 | 23.1.1.1 | HTTP Get | |
不间断 | 23.1.0.20~23.1.0.29 | 23.1.1.7 | Benign |
2021-05-22T20:14:00—2021-05-23T16:15:00 | 23.1.0.1~23.1.0.13 | 23.1.1.2~23.1.1.6 | Other |
流量类型 | 流量编号 | 流量类型所占比例/% | 流量类型具体数目 |
---|---|---|---|
CC | 0 | 7.32 | 71 591 |
HTTP Flood | 1 | 1.66 | 16 200 |
HTTP Post | 2 | 1.57 | 15 317 |
HTTP Get | 3 | 1.69 | 16 546 |
Benign | 4 | 38.04 | 371 830 |
Other | 5 | 49.72 | 485 953 |
表4 不同流量类型所占比例表
Tab.4 Percentages of different traffic types
流量类型 | 流量编号 | 流量类型所占比例/% | 流量类型具体数目 |
---|---|---|---|
CC | 0 | 7.32 | 71 591 |
HTTP Flood | 1 | 1.66 | 16 200 |
HTTP Post | 2 | 1.57 | 15 317 |
HTTP Get | 3 | 1.69 | 16 546 |
Benign | 4 | 38.04 | 371 830 |
Other | 5 | 49.72 | 485 953 |
特征 数量 | 流量类型 | 精准率 | 召回率 | F1分数 | 离线训练 时间/s | 在线检测 时间/s |
---|---|---|---|---|---|---|
47 | CC | 0.991 3 | 0.988 8 | 0.990 9 | 3 354.48 | 56.83 |
HTTP Flood | 0.961 8 | 0.973 0 | 0.9 674 | |||
HTTP Post | 0.872 6 | 0.901 8 | 0.887 0 | |||
HTTP Get | 0.923 5 | 0.876 0 | 0.899 1 | |||
Benign | 1.000 0 | 1.000 0 | 1.000 0 | |||
Other | 1.000 0 | 1.000 0 | 1.000 0 | |||
78 | CC | 0.967 6 | 0.947 6 | 0.957 5 | 4 372.74 | 93.41 |
HTTP Flood | 0.824 7 | 0.928 4 | 0.873 5 | |||
HTTP Post | 0.872 7 | 0.900 7 | 0.886 5 | |||
HTTP Get | 0.918 6 | 0.871 4 | 0.894 4 | |||
Benign | 1.000 0 | 1.000 0 | 1.000 0 | |||
Other | 1.000 0 | 0.994 7 | 0.997 3 |
表5 不同特征训练模型的指标对比
Tab.5 Index comparison of different feature training models
特征 数量 | 流量类型 | 精准率 | 召回率 | F1分数 | 离线训练 时间/s | 在线检测 时间/s |
---|---|---|---|---|---|---|
47 | CC | 0.991 3 | 0.988 8 | 0.990 9 | 3 354.48 | 56.83 |
HTTP Flood | 0.961 8 | 0.973 0 | 0.9 674 | |||
HTTP Post | 0.872 6 | 0.901 8 | 0.887 0 | |||
HTTP Get | 0.923 5 | 0.876 0 | 0.899 1 | |||
Benign | 1.000 0 | 1.000 0 | 1.000 0 | |||
Other | 1.000 0 | 1.000 0 | 1.000 0 | |||
78 | CC | 0.967 6 | 0.947 6 | 0.957 5 | 4 372.74 | 93.41 |
HTTP Flood | 0.824 7 | 0.928 4 | 0.873 5 | |||
HTTP Post | 0.872 7 | 0.900 7 | 0.886 5 | |||
HTTP Get | 0.918 6 | 0.871 4 | 0.894 4 | |||
Benign | 1.000 0 | 1.000 0 | 1.000 0 | |||
Other | 1.000 0 | 0.994 7 | 0.997 3 |
模型 | 准确率 | 宏平均精确率 | 宏平均召回率 | 宏平均F1分数 |
---|---|---|---|---|
Bagging | 0.992 5 | 0.942 3 | 0.938 0 | 0.939 7 |
AdaBoost | 0.992 2 | 0.934 8 | 0.931 8 | 0.933 1 |
XGBoost | 0.992 4 | 0.933 8 | 0.947 6 | 0.940 5 |
Stacking | 0.994 3 | 0.958 2 | 0.956 6 | 0.957 4 |
表6 Stacking模型与其他集成策略模型性能对比
Tab.6 Performance comparison between Stacking model and other integration strategy models
模型 | 准确率 | 宏平均精确率 | 宏平均召回率 | 宏平均F1分数 |
---|---|---|---|---|
Bagging | 0.992 5 | 0.942 3 | 0.938 0 | 0.939 7 |
AdaBoost | 0.992 2 | 0.934 8 | 0.931 8 | 0.933 1 |
XGBoost | 0.992 4 | 0.933 8 | 0.947 6 | 0.940 5 |
Stacking | 0.994 3 | 0.958 2 | 0.956 6 | 0.957 4 |
时间窗口值/min | 恶意流量检测率/% |
---|---|
1 | 97.88 |
2 | 98.01 |
3 | 97.14 |
表7 恶意流量检测率对比
Tab.7 Malicious traffic detection rate comparison
时间窗口值/min | 恶意流量检测率/% |
---|---|
1 | 97.88 |
2 | 98.01 |
3 | 97.14 |
攻击种类 | 精准率 | 召回率 | F1分数 |
---|---|---|---|
Benign | 0.999 7 | 0.999 8 | 0.999 9 |
CC | 0.988 5 | 0.984 9 | 0.986 7 |
HTTP Flood | 0.889 8 | 0.977 8 | 0.931 7 |
HTTP Get | 0.929 2 | 0.877 8 | 0.902 8 |
HTTP Post | 0.865 3 | 0.914 4 | 0.889 2 |
Other | 0.996 8 | 0.991 5 | 0.994 1 |
表8 最优时间窗口的检测性能
Tab.8 Test performance for optimal time window
攻击种类 | 精准率 | 召回率 | F1分数 |
---|---|---|---|
Benign | 0.999 7 | 0.999 8 | 0.999 9 |
CC | 0.988 5 | 0.984 9 | 0.986 7 |
HTTP Flood | 0.889 8 | 0.977 8 | 0.931 7 |
HTTP Get | 0.929 2 | 0.877 8 | 0.902 8 |
HTTP Post | 0.865 3 | 0.914 4 | 0.889 2 |
Other | 0.996 8 | 0.991 5 | 0.994 1 |
1 | 绿盟科技,中国电信云堤. 2020DDoS攻击态势报告[R/OL]. (2021-01-21) [2022-01-11].. 10.26524/royal.61 |
NSFOCUS. China Telecom DamDDoS. 2020 DDoS attack situation report[R/OL]. (2021-01-21) [2022-01-11].. 10.26524/royal.61 | |
2 | ZHANG B, LIU Z H, DONG S Q. IAP‑based self‑learning real‑ time application layer DDoS detection method on storm platform[C]// Proceedings of the 2019 IEEE International Conference on Parallel and Distributed Processing with Applications, Big Data and Cloud Computing, Sustainable Computing and Communications, Social Computing and Networking. Piscataway: IEEE, 2019: 912-919. 10.1109/ispa-bdcloud-sustaincom-socialcom48970.2019.00133 |
3 | JING X Y, YAN Z, PEDRYCZ W. Security data collection and data analytics in the Internet: a survey[J]. IEEE Communications Surveys and Tutorials, 2019, 21(1): 586-618. 10.1109/comst.2018.2863942 |
4 | ERHAN D, ANARIM E. Istatistiksel yöntemler ile DDoS saldiri tespiti DDoS detection using statistical methods[C]// Proceedings of the 28th Signal Processing and Communications Applications Conference. Piscataway: IEEE, 2020: 1-4. 10.1109/siu49456.2020.9302487 |
5 | WANG C, MIU T T N, LUO X, et al. SkyShield: a sketch‑ based defense system against application layer DDoS attacks[J]. IEEE Transactions on Information Forensics and Security, 2018, 13(3): 559-573. 10.1109/tifs.2017.2758754 |
6 | TANG J, CHENG Y, HAO Y, et al. SIP flooding attack detection with a multi‑dimensional sketch design[J]. IEEE Transactions on Dependable and Secure Computing, 2014, 11(6):582-595. 10.1109/tdsc.2014.2302298 |
7 | 张蕾,崔勇,刘静,等. 机器学习在网络空间安全研究中的应用[J]. 计算机学报, 2018, 41(9):1943-1975. 10.11897/SP.J.1016.2018.01943 |
ZHANG L, CUI Y, LIU J, et al. Application of machine learning in cyberspace security research[J]. Chinese Journal of Computers, 2018, 41(9): 1943-1975. 10.11897/SP.J.1016.2018.01943 | |
8 | SHE C Y, WEN W S, ZHENG K S, et al. Application layer DDoS detection by K‑means algorithm[C]// Proceedings of the 4th International Conference on Electrical and Electronics Engineering and Computer Science. Dordrecht: Atlantis Press, 2016: 75-78. 10.2991/iceeecs-16.2016.16 |
9 | JOHNSON SINGH K, THONGAM K, DE T. Entropy‑based application layer DDoS attack detection using artificial neural networks[J]. Entropy, 2016, 18(10): No.350. 10.3390/e18100350 |
10 | ADI E, BAIG Z, HINGSTON P. Stealthy Denial of Service (DoS) attack modelling and detection for HTTP/2 services[J]. Journal of Network and Computer Applications, 2017, 91:1-13. 10.1016/j.jnca.2017.04.015 |
11 | 顾玥,李丹,高凯辉. 基于机器学习和深度学习的网络流量分类研究[J]. 电信科学, 2021, 37(3): 105-113. |
GU Y, LI D, GAO K H. Research on network traffic classification based on machine learning and deep learning[J]. Telecommunications Science, 2021, 37(3): 105-113. | |
12 | LOTFOLLAHI M, JAFARI SIAVOSHANI M, SHIRALI HOSSEIN ZADE R, et al. Deep packet: a novel approach for encrypted traffic classification using deep learning[J]. Soft Computing, 2020, 24(3): 1999-2012. 10.1007/s00500-019-04030-2 |
13 | WANG W, ZHU M, WANG J L, et al. End‑to‑end encrypted traffic classification with one‑dimensional convolution neural networks[C]// Proceedings of the 2017 IEEE International Conference on Intelligence and Security Informatics. Piscataway: IEEE, 2017: 43-48. 10.1109/isi.2017.8004872 |
14 | 周志华. 集成学习:基础与算法[M]. 李楠,译. 北京:电子工业出版社, 2020:21-60. |
ZHOU Z H. Ensemble Methods: Foundations and Algorithms[M]. LI N, translated. Beijing: Publishing House of Electronics Industry, 2020:21-60. | |
15 | SHARAFALDIN I, HABIBI LASHKARI A, GHORBANI A A. Toward generating a new intrusion detection dataset and intrusion traffic characterization[C]// Proceedings of the 4th International Conference on Information Systems Security and Privacy. Setúbal: SciTePress, 2018: 108-116. 10.5220/0006639801080116 |
16 | LASHKARI A H. CICFlowMeter[CP/OL]. [2022-01-12].. 10.1149/ma2022-0115mtgabs |
17 | 周志华. 机器学习[M]. 北京: 清华大学出版社, 2016:247-267. |
ZHOU Z H. Machine Learning[M]. Beijing: Tsinghua University Press, 2016:247-267. | |
18 | PAVLYSHENKO B. Using stacking approaches for machine learning models[C]// Proceedings of the IEEE 2nd International Conference on Data Stream Mining and Processing. Piscataway: IEEE, 2018:255-258. 10.1109/dsmp.2018.8478522 |
19 | BREIMAN L. Random forests[J]. Machine Learning, 2001, 45(1): 5-32. 10.1023/a:1010933404324 |
20 | CHEN T Q, GUESTRIN C. XGBoost: a scalable tree boosting system[C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2016: 785-794. 10.1145/2939672.2939785 |
21 | ROSKA T, CHUA L O. The CNN universal machine: an analogic array computer[J]. IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, 1993, 40(3): 163-173. 10.1109/82.222815 |
22 | 王子恒. 基于区块链的海量连接管理架构设计与实现[D]. 北京:北京交通大学, 2021:40-52. 10.53469/jissr.2021.08(12).29 |
WANG Z H. Design and implementation of mass connection management architecture based on blockchain[D]. Beijing: Beijing Jiaotong University, 2021: 40-52. 10.53469/jissr.2021.08(12).29 |
[1] | 蔡淳豪, 李建良. 小样本问题下培训弱教师网络的模型蒸馏模型[J]. 《计算机应用》唯一官方网站, 2022, 42(9): 2652-2658. |
[2] | 郭一阳, 于炯, 杜旭升, 杨少智, 曹铭. 基于自编码器与集成学习的离群点检测算法[J]. 《计算机应用》唯一官方网站, 2022, 42(7): 2078-2087. |
[3] | 李洪亮, 张弄, 孙婷, 李想. 分布式机器学习作业性能干扰分析与预测[J]. 《计算机应用》唯一官方网站, 2022, 42(6): 1649-1655. |
[4] | 包永春, 张建臣, 杜守信, 张军军. 基于非负矩阵分解与稀疏表示的多标签分类算法[J]. 《计算机应用》唯一官方网站, 2022, 42(5): 1375-1382. |
[5] | 李晓寒, 贾华丁, 程雪, 李太勇. 基于改进遗传算法和图神经网络的股市波动预测方法[J]. 《计算机应用》唯一官方网站, 2022, 42(5): 1624-1633. |
[6] | 王颖洁, 朱久祺, 汪祖民, 白凤波, 弓箭. 自然语言处理在文本情感分析领域应用综述[J]. 《计算机应用》唯一官方网站, 2022, 42(4): 1011-1020. |
[7] | 陈浩杰, 范江亭, 刘勇. 深度强化学习解决动态旅行商问题[J]. 《计算机应用》唯一官方网站, 2022, 42(4): 1194-1200. |
[8] | 许仁杰, 刘宝弟, 张凯, 刘伟锋. 基于贝叶斯权函数的模型无关元学习算法[J]. 《计算机应用》唯一官方网站, 2022, 42(3): 708-712. |
[9] | 刘海杨, 孟令航, 林仲航, 谷源涛. 基于轨迹点聚类的航路发现方法[J]. 《计算机应用》唯一官方网站, 2022, 42(3): 890-894. |
[10] | 陈露, 张晓霞, 于洪. 基于先验知识的非负矩阵半可解释三因子分解算法[J]. 《计算机应用》唯一官方网站, 2022, 42(3): 671-675. |
[11] | 谢鑫, 张贤勇, 王旋晔, 唐鹏飞. 变精度邻域等价粒的邻域决策树构造算法[J]. 《计算机应用》唯一官方网站, 2022, 42(2): 382-388. |
[12] | 李蒙蒙, 刘艺, 李庚松, 郑奇斌, 秦伟, 任小广. 不平衡多分类算法综述[J]. 《计算机应用》唯一官方网站, 2022, 42(11): 3307-3321. |
[13] | 汪烨, 周澳回, 周思源, 姜波, 陈骏武, 宋师哲. 智能计算服务的需求获取方法[J]. 《计算机应用》唯一官方网站, 2022, 42(11): 3486-3492. |
[14] | 李小娟, 韩萌, 王乐, 张妮, 程浩东. 基于准确率爬坡的动态加权集成分类算法[J]. 《计算机应用》唯一官方网站, 2022, 42(1): 123-131. |
[15] | 郭棉, 张锦友. 移动边缘计算环境中面向机器学习的计算迁移策略[J]. 计算机应用, 2021, 41(9): 2639-2645. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||