《计算机应用》唯一官方网站 ›› 2025, Vol. 45 ›› Issue (6): 1841-1848.DOI: 10.11772/j.issn.1001-9081.2024060840
• 人工智能 • 上一篇
收稿日期:
2024-06-24
修回日期:
2024-09-09
接受日期:
2024-09-10
发布日期:
2024-09-25
出版日期:
2025-06-10
通讯作者:
徐正
作者简介:
李道全(1967—),男,山东日照人,教授,博士,CCF会员,主要研究方向:物联网、软件定义网络、网络安全、电子商务基金资助:
Daoquan LI, Zheng XU(), Sihui CHEN, Jiayu LIU
Received:
2024-06-24
Revised:
2024-09-09
Accepted:
2024-09-10
Online:
2024-09-25
Published:
2025-06-10
Contact:
Zheng XU
About author:
LI Daoquan, born in 1967, Ph. D., professor. His research interests include internet of things, software defined network, network security, electronic commerce.Supported by:
摘要:
网络流量分类问题一直是一种随着网络通信发展而不断迭代方法的难题,发展至今已有多种解决方法。目前对网络数据进行分类时大多数方法会将目光聚集在种类均衡的数据集上以便于实验和计算。针对大部分现实网络数据集仍不平衡的问题,提出一种融合变分自编码器(VAE)与自适应增强卷积神经网络(AdaBoost-CNN)的网络流量分类模型VAE-ABC (Variational AutoEncoder-Adaptive Boosting-Convolutional neural network)。首先,在数据层面使用VAE对不平衡数据集进行部分增强,并利用VAE学习数据潜在分布的特性缩短学习时间;其次,为了在算法层面提高分类效果,结合集成学习的思想,以自适应增强(AdaBoost)算法为基础设计一种使用改进的卷积神经网络(CNN)作为弱分类器的AdaBoost-CNN算法,从而提高学习和训练的准确率;最后,使用全连接层完成特征映射,并通过激活函数Sigmoid获得最终的分类结果。多重对比实验的结果表明,所提模型在分类数据集ISCX VPN-nonVPN划分后的不平衡子数据集上的准确率达到了94.31%,对比使用支持向量机(SVM)作为弱分类器的AdaBoost-SVM、使用SMOTE (Synthetic Minority Oversampling TEchnique)算法与SVM结合的SMOTE-SVM、使用决策树(D-T)作为弱分类器并与SMOTE算法结合的SMOTE-AB-D-T,所提模型的准确率分别提高了1.34、0.63和0.24个百分点。可见,所提模型在该数据集上的分类效果优于其他模型。
中图分类号:
李道全, 徐正, 陈思慧, 刘嘉宇. 融合变分自编码器与自适应增强卷积神经网络的网络流量分类模型[J]. 计算机应用, 2025, 45(6): 1841-1848.
Daoquan LI, Zheng XU, Sihui CHEN, Jiayu LIU. Network traffic classification model integrating variational autoencoder and AdaBoost-CNN[J]. Journal of Computer Applications, 2025, 45(6): 1841-1848.
流量类别 | 流量类型 | 流量包大小/MB |
---|---|---|
non-VPN | 7.85 | |
Chat | 34.60 | |
Streaming | 2 826.20 | |
File Transfer | 17 715.20 | |
P2P | 96.80 | |
VoIP | 4.48 | |
VPN | VPN-Email | 7.80 |
VPN-Chat | 27.60 | |
VPN-Streaming | 1.37 | |
VPN-File Transfer | 279.00 | |
VPN-P2P | 358.00 | |
VPN-VoIP | 360.00 |
表1 ISCX VPN-NON VPN数据集中的12类
Tab. 1 Twelve categories in ISCX VPN nonVPN dataset
流量类别 | 流量类型 | 流量包大小/MB |
---|---|---|
non-VPN | 7.85 | |
Chat | 34.60 | |
Streaming | 2 826.20 | |
File Transfer | 17 715.20 | |
P2P | 96.80 | |
VoIP | 4.48 | |
VPN | VPN-Email | 7.80 |
VPN-Chat | 27.60 | |
VPN-Streaming | 1.37 | |
VPN-File Transfer | 279.00 | |
VPN-P2P | 358.00 | |
VPN-VoIP | 360.00 |
序号 | 网络层 | 配置 |
---|---|---|
1 | 1D Convolution | 32 filters,3×1 kernel 和 ReLU |
2 | Max-Pooling | 2×1 kernel |
3 | Dropout | 20% |
4 | Fully connected | 128 Neurons,ReLU |
5 | Dropout | 20% |
6 | Fully connected | 64 Neurons,ReLU |
7 | Fully connected | 3 Neurons, Softmax |
表2 各层对应的参数配置
Tab. 2 Parameter setting corresponding to each layer
序号 | 网络层 | 配置 |
---|---|---|
1 | 1D Convolution | 32 filters,3×1 kernel 和 ReLU |
2 | Max-Pooling | 2×1 kernel |
3 | Dropout | 20% |
4 | Fully connected | 128 Neurons,ReLU |
5 | Dropout | 20% |
6 | Fully connected | 64 Neurons,ReLU |
7 | Fully connected | 3 Neurons, Softmax |
模型 | 准确率/% | 计算时间/s |
---|---|---|
VAE-ABC-7layers | 96.53 | 426.11 |
VAE-ABC-5layers | 95.58 | 389.37 |
VAE-H1 | 62.22 | 105.93 |
ABC-5layers | 94.31 | 1 974.61 |
ABC-7layers | 95.61 | 2 447.65 |
H1 | 47.16 | 34.61 |
表3 对模型的消融实验结果
Tab. 3 Model ablation experiment results
模型 | 准确率/% | 计算时间/s |
---|---|---|
VAE-ABC-7layers | 96.53 | 426.11 |
VAE-ABC-5layers | 95.58 | 389.37 |
VAE-H1 | 62.22 | 105.93 |
ABC-5layers | 94.31 | 1 974.61 |
ABC-7layers | 95.61 | 2 447.65 |
H1 | 47.16 | 34.61 |
分类器数 | 训练准确率/% | 测试准确率/% |
---|---|---|
8 | 94.41 | 93.28 |
10 | 95.89 | 94.00 |
12 | 96.02 | 94.15 |
15 | 95.43 | 93.62 |
表4 不同数量分类器下的模型对比实验结果
Tab. 4 Experimental results of model comparison under different numbers of classifiers
分类器数 | 训练准确率/% | 测试准确率/% |
---|---|---|
8 | 94.41 | 93.28 |
10 | 95.89 | 94.00 |
12 | 96.02 | 94.15 |
15 | 95.43 | 93.62 |
模型 | 训练准确率 | 测试准确率 |
---|---|---|
AdaBoost-CNN | 96.02 | 94.15 |
AdaBoost-D-T | 91.78 | 77.08 |
1DCNN-5Epochs | 95.00 | 91.35 |
1DCNN-10Epochs | 95.67 | 92.18 |
1DCNN-15Epochs | 94.84 | 91.26 |
表5 不同模型在ISCX VPN-nonVPN数据集上的训练与测试效果 (%)
Tab. 5 Training and testing effects of different models on ISCX VPN-nonVPN dataset
模型 | 训练准确率 | 测试准确率 |
---|---|---|
AdaBoost-CNN | 96.02 | 94.15 |
AdaBoost-D-T | 91.78 | 77.08 |
1DCNN-5Epochs | 95.00 | 91.35 |
1DCNN-10Epochs | 95.67 | 92.18 |
1DCNN-15Epochs | 94.84 | 91.26 |
模型 | 训练准确率 | 测试准确率 |
---|---|---|
VAE-ABC | 96.53 | 94.31 |
AdaBoost-CNN | 96.02 | 94.15 |
1DCNN-10Epochs | 95.67 | 92.18 |
CNN-Weighted | 95.87 | 93.24 |
CNN-Loss | 95.11 | 91.59 |
AdaBoost-SVM | 94.24 | 92.97 |
SMOTE-SVM | 95.88 | 93.68 |
SMOTE-AdaBoost-DT | 96.21 | 94.07 |
AdaBoost-D-T | 91.78 | 77.08 |
ResNet | 93.46 | 82.05 |
表6 不同模型在ISCX VPN-nonVPN数据集上的测试结果 (%)
Tab. 6 Test results of different models on ISCX VPN-nonVPN dataset
模型 | 训练准确率 | 测试准确率 |
---|---|---|
VAE-ABC | 96.53 | 94.31 |
AdaBoost-CNN | 96.02 | 94.15 |
1DCNN-10Epochs | 95.67 | 92.18 |
CNN-Weighted | 95.87 | 93.24 |
CNN-Loss | 95.11 | 91.59 |
AdaBoost-SVM | 94.24 | 92.97 |
SMOTE-SVM | 95.88 | 93.68 |
SMOTE-AdaBoost-DT | 96.21 | 94.07 |
AdaBoost-D-T | 91.78 | 77.08 |
ResNet | 93.46 | 82.05 |
1 | REZAEI S, LIU X. Deep learning for encrypted traffic classification: an overview[J]. IEEE Communications Magazine, 2019, 57(5): 76-81. |
2 | 于治平,刘彩霞,刘树新,等. 基于机器学习的网络流量分类综述[J]. 信息工程大学学报, 2023, 24(4):447-453, 483. |
YU Z P, LIU C X, LIU S X, et al. Overview of network traffic classification based on machine learning [J]. Journal of Information Engineering University, 2023, 24(4):447-453, 483. | |
3 | 王和勇,樊泓坤,姚正安,等. 不平衡数据集的分类方法研究[J]. 计算机应用研究, 2008, 25(5): 1301-1303, 1308. |
WANG H Y, FAN H K, YAO Z A, et al. Research of imbalanced data classification[J]. Application Research of Computers, 2008, 25(5): 1301-1303, 1308. | |
4 | LEE W, JUN C H, LEE J S. Instance categorization by support vector machines to adjust weights in AdaBoost for imbalanced data classification[J]. Information Sciences, 2017, 381:92-103. |
5 | GALAR M, FERNÁNDEZ A, BARRENECHEA E, et al. Ordering-based pruning for improving the performance of ensembles of classifiers in the framework of imbalanced datasets[J]. Information Sciences, 2016, 354:178-196. |
6 | HU X, GU C, WEI F. CLD-Net: a network combining CNN and LSTM for Internet encrypted traffic classification [J]. Security and Communication Networks, 2021, 2021: No.5518460. |
7 | 刘丹,姚立霜,王云锋,等. 面向类不平衡流量数据的分类模型[J]. 计算机应用, 2020, 40(8): 2327-2333. |
LIU D, YAO L S, WANG Y F, et al. Classification model for class imbalanced traffic data [J]. Journal of Computer Applications, 2020, 40(8): 2327-2333. | |
8 | BUDA M, MAKI A, MAZUROWSKI M A. A systematic study of the class imbalance problem in convolutional neural networks[J]. Neural Networks, 2018, 106:249-259. |
9 | SUN Z, SONG Q, ZHU X, et al. A novel ensemble method for classifying imbalanced data[J]. Pattern Recognition, 2015, 48(5):1623-1637. |
10 | VUCETIC S, OBRADOVIC Z. Classification on data with biased class distribution[C]// Proceedings of the 12th European Conference on Machine Learning, LNCS 2167. Berlin: Springer, 2001: 527-538. |
11 | REZAEI A, YAZDINEJAD M, SOOKHAK M. Credit card fraud detection using tree-based algorithms for highly imbalanced data[C]// Proceedings of the IEEE 3rd International Conference on Computing and Machine Intelligence. Piscataway: IEEE, 2024:1-6. |
12 | NAMKOONG H, DUCHI J C. Variance-based regularization with convex objectives[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2017:2975-2984. |
13 | 邹腾宽,汪钰颖,吴承荣. 网络背景流量的分类与识别研究综述[J]. 计算机应用, 2019, 39(3):802-811. |
ZOU T K, WANG Y Y, WU C R. Review of network background traffic classification and identification[J]. Journal of Computer Applications, 2019, 39(3): 802-811. | |
14 | WANG W, ZHU M, ZENG X, et al. Malware traffic classification using convolutional neural network for representation learning[C]// Proceedings of the 2017 International Conference on Information Networking. Piscataway: IEEE, 2017: 712-717. |
15 | CIREAN D, MEIER U, SCHMIDHUBER J. Multi-column deep neural networks for image classification[C]// Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2012:3642-3649. |
16 | FRAZÃO X, ALEXANDRE L A. Weighted convolutional neural network ensemble[C]// Proceedings of the 2014 Iberoamerican Congress on Pattern Recognition, LNCS 8827. Cham: Springer, 2014:674-681. |
17 | TAHERKHANI A, COSMA G, McGINNITY T M. AdaBoost-CNN: an adaptive boosting algorithm for convolutional neural networks to classify multi-class imbalanced datasets using transfer learning [J]. Neurocomputing, 2020, 404:351-366. |
18 | KRUSE L E, KÜHL S, DOCHHAN A, et al. Monitoring data augmentation of spectral information using VAE and GAN for soft-failure identification [C]// Proceedings of the 2024 Optical Fiber Communications Conference and Exhibition. Piscataway: IEEE, 2024:1-3. |
19 | LYGERAKIS F, RUECKERT E. CR-VAE: contrastive regularization on variational autoencoders for preventing posterior collapse[C]// Proceedings of the 7th Asian Conference on Artificial Intelligence Technology. Piscataway: IEEE, 2023:427-437. |
20 | 谢胜利,陈泓达,高军礼,等. 基于分布对齐变分自编码器的深度多视图聚类[J]. 计算机学报, 2023, 46(5):945-959. |
XIE S L, CHEN H D, GAO J L, et al. Deep multi-view clustering based on distribution aligned variational autoencoder[J]. Chinese Journal of Computers, 2023, 46(5):945-959. | |
21 | DRAPER-GIL G, LASHKARI A H, MAMUN M S I, et al. Characterization of encrypted and VPN traffic using time-related features[C]// Proceedings of the 2nd International Conference on Information Systems Security and Privacy. Setúbal: SciTePress, 2016: 407-414. |
22 | WANG W, ZHU M, WANG J, et al. End-to-end encrypted traffic classification with one-dimensional convolution neural networks[C]// Proceedings of the 2017 IEEE International Conference on Intelligence and Security Informatics. Piscataway: IEEE, 2017:43-48. |
23 | MIYAKE N, TAKIGUCHI T, ARIKI Y, et al. Noise detection with multi-class AdaBoost[EB/OL]. [2024-04-21].. |
24 | 边玉婵. 基于SVM和AdaBoost的网络入侵检测方法研究[D]. 沈阳:沈阳工业大学, 2022:35-43. |
BIAN Y C. Research on network intrusion detection method based on SVM and AdaBoost[D]. Shenyang: Shenyang University of Technology, 2022:35-43. | |
25 | 吴海燕,陈晓磊,范国轩. 一种自适应核SMOTE-SVM算法用于不平衡数据分类[J]. 北京化工大学学报(自然科学版), 2023, 50(2):97-104. |
WU H Y, CHEN X L, FAN G X. An adaptive kernel SMOTE-SVM algorithm for imbalanced data classification[J]. Journal of Beijing University of Chemical Technology (Natural Science Edition), 2023, 50(2):97-104. | |
26 | 赵佳丽,徐明江,吴增源,等. 基于SMOTE-AdaBoost-DT的类别不平衡信用评分模型[J]. 中国计量大学学报, 2021, 32(4):549-554. |
ZHAO J L, XU M J, WU Z Y, et al. A SMOTE-AdaBoost-DT model for credit scoring[J]. Journal of China University of Metrology, 2021, 32(4):549-554. |
[1] | 姜超英, 李倩, 刘宁, 刘磊, 崔立真. 基于图对比学习的再入院预测模型[J]. 《计算机应用》唯一官方网站, 2025, 45(6): 1784-1792. |
[2] | 何玉林, 李旭, 贺颖婷, 崔来中, 黄哲学. 基于最大均值差异的子空间高斯混合模型聚类集成算法[J]. 《计算机应用》唯一官方网站, 2025, 45(6): 1712-1723. |
[3] | 李雪莹, 杨琨, 涂国庆, 刘树波. 基于局部增强的时序数据对抗样本生成方法[J]. 《计算机应用》唯一官方网站, 2025, 45(5): 1573-1581. |
[4] | 田仁杰, 景明利, 焦龙, 王飞. 基于混合负采样的图对比学习推荐算法[J]. 《计算机应用》唯一官方网站, 2025, 45(4): 1053-1060. |
[5] | 孙海涛, 林佳瑜, 梁祖红, 郭洁. 结合标签混淆的中文文本分类数据增强技术[J]. 《计算机应用》唯一官方网站, 2025, 45(4): 1113-1119. |
[6] | 盛坤, 王中卿. 基于大语言模型和数据增强的通感隐喻分析[J]. 《计算机应用》唯一官方网站, 2025, 45(3): 794-800. |
[7] | 陈瑞龙, 胡涛, 卜佑军, 伊鹏, 胡先君, 乔伟. 面向加密恶意流量检测模型的堆叠集成对抗防御方法[J]. 《计算机应用》唯一官方网站, 2025, 45(3): 864-871. |
[8] | 孙晨伟, 侯俊利, 刘祥根, 吕建成. 面向工程图纸理解的大语言模型提示生成方法[J]. 《计算机应用》唯一官方网站, 2025, 45(3): 801-807. |
[9] | 洪梓榕, 包广清. 基于集成学习的雷达自动目标识别综述[J]. 《计算机应用》唯一官方网站, 2025, 45(2): 371-382. |
[10] | 严雪文, 黄章进. 基于对比学习的小样本图像分类方法[J]. 《计算机应用》唯一官方网站, 2025, 45(2): 383-391. |
[11] | 富坤, 应世聪, 郑婷婷, 屈佳捷, 崔静远, 李建伟. 面向小样本节点分类的图数据增强方法[J]. 《计算机应用》唯一官方网站, 2025, 45(2): 392-402. |
[12] | 张嘉琳, 任庆桦, 毛启容. 利用全局-局部特征依赖的反欺骗说话人验证系统[J]. 《计算机应用》唯一官方网站, 2025, 45(1): 308-317. |
[13] | 杨莹, 郝晓燕, 于丹, 马垚, 陈永乐. 面向图神经网络模型提取攻击的图数据生成方法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2483-2492. |
[14] | 汪炅, 唐韬韬, 贾彩燕. 无负采样的正样本增强图对比学习推荐方法PAGCL[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1485-1492. |
[15] | 郭洁, 林佳瑜, 梁祖红, 罗孝波, 孙海涛. 基于知识感知和跨层次对比学习的推荐方法[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1121-1127. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||