融合变分自编码器与自适应增强卷积神经网络的网络流量分类模型

doi:10.11772/j.issn.1001-9081.2024060840

《计算机应用》唯一官方网站 ›› 2025, Vol. 45 ›› Issue (6): 1841-1848.DOI: 10.11772/j.issn.1001-9081.2024060840

• 人工智能 • 上一篇

融合变分自编码器与自适应增强卷积神经网络的网络流量分类模型

李道全, 徐正(), 陈思慧, 刘嘉宇

青岛理工大学信息与控制工程学院，山东青岛 266520

收稿日期:2024-06-24 修回日期:2024-09-09 接受日期:2024-09-10 发布日期:2024-09-25 出版日期:2025-06-10
通讯作者: 徐正
作者简介:李道全（1967—），男，山东日照人，教授，博士，CCF会员，主要研究方向：物联网、软件定义网络、网络安全、电子商务
徐正（2001—），男，江西抚州人，硕士研究生，主要研究方向：网络安全、机器学习、深度学习、流量分类、软件定义网络 1455080545@qq.com
陈思慧（2001—），女，湖北孝感人，硕士研究生，主要研究方向：入侵检测、网络安全、数据挖掘
刘嘉宇（2000 —），男，山东枣庄人，硕士研究生，主要研究方向：网络安全、深度学习、流量分类。
基金资助:
山东省自然科学基金面上项目(ZR2023MF052)

Network traffic classification model integrating variational autoencoder and AdaBoost-CNN

Daoquan LI, Zheng XU(), Sihui CHEN, Jiayu LIU

School of Information and Control Engineering，Qingdao University of Technology，Qingdao Shandong 266520，China

Received:2024-06-24 Revised:2024-09-09 Accepted:2024-09-10 Online:2024-09-25 Published:2025-06-10
Contact: Zheng XU
About author:LI Daoquan， born in 1967， Ph. D.， professor. His research interests include internet of things， software defined network， network security， electronic commerce.
XU Zheng， born in 2001， M. S. candidate. His research interests include network security， machine learning， deep learning， traffic classification， software defined network.
CHEN Sihui， born in 2001， M. S. candidate. Her research interests include intrusion detection， network security， data mining.
LIU Jiayu， born in 2000， M. S. candidate. His research interests include network security， deep learning， traffic classification.
Supported by:
Shandong Provincial Natural Science Foundation(ZR2023MF052)

摘要/Abstract

摘要：

网络流量分类问题一直是一种随着网络通信发展而不断迭代方法的难题，发展至今已有多种解决方法。目前对网络数据进行分类时大多数方法会将目光聚集在种类均衡的数据集上以便于实验和计算。针对大部分现实网络数据集仍不平衡的问题，提出一种融合变分自编码器（VAE）与自适应增强卷积神经网络（AdaBoost-CNN）的网络流量分类模型VAE-ABC （Variational AutoEncoder-Adaptive Boosting-Convolutional neural network）。首先，在数据层面使用VAE对不平衡数据集进行部分增强，并利用VAE学习数据潜在分布的特性缩短学习时间；其次，为了在算法层面提高分类效果，结合集成学习的思想，以自适应增强（AdaBoost）算法为基础设计一种使用改进的卷积神经网络（CNN）作为弱分类器的AdaBoost-CNN算法，从而提高学习和训练的准确率；最后，使用全连接层完成特征映射，并通过激活函数Sigmoid获得最终的分类结果。多重对比实验的结果表明，所提模型在分类数据集ISCX VPN-nonVPN划分后的不平衡子数据集上的准确率达到了94.31%，对比使用支持向量机（SVM）作为弱分类器的AdaBoost-SVM、使用SMOTE （Synthetic Minority Oversampling TEchnique）算法与SVM结合的SMOTE-SVM、使用决策树（D-T）作为弱分类器并与SMOTE算法结合的SMOTE-AB-D-T，所提模型的准确率分别提高了1.34、0.63和0.24个百分点。可见，所提模型在该数据集上的分类效果优于其他模型。

关键词: 网络流量分类, 不平衡数据集, 数据增强, 变分自编码器, 集成学习, 自适应增强算法

Abstract:

The problem of network traffic classification has always been a challenge of iterative methods with the development of network communication， and many solutions have been developed. At present， most network data classification methods focus on the balanced dataset to facilitate experiment and calculation. To solve the problem that most real network datasets are still unbalanced， a network traffic classification model VAE-ABC （Variational AutoEncoder- Adaptive Boosting-Convolutional neural network） was proposed by integrating Variational AutoEncoder （VAE） and Adaptive Boosting Convolutional Neural Network （AdaBoost-CNN）. Firstly， at the data level， VAE was used to partially enhance the unbalanced dataset， and shorten the learning time with the VAE’s characteristics of learning data potential distribution. Then， in order to improve classification effect at the algorithm level， combining with the idea of ensemble learning， AdaBoost-CNN algorithm was designed on the basis of Adaptive Boosting （AdaBoost） algorithm with using an improved Convolutional Neural Network （CNN） as a weak classifier， thereby improving the accuracy of learning and training. Finally， the fully connected layer was used to complete feature mapping， and then the final classification results were obtained through an activation function Sigmoid. After multiple comparisons， experimental results show that the proposed model achieves an accuracy of 94.31% on the unbalanced sub-dataset of partitioned classification dataset ISCX VPN-nonVPN. Compared with AdaBoost-SVM， using Support Vector Machine （SVM） as a weak classifier， SMOTE-SVM， combining SMOTE （Synthetic Minority Oversampling TEchnique） and SVM， and SMOTE-AB-D-T， with Decision Tree （D-T） as a weak classifier and combined with SMOTE algorithm， the proposed model has the accuracy increased by 1.34， 0.63 and 0.24 percentage points， respectively. It can be seen that the classification effect of this model is better than those of other models on this dataset.

Key words: network traffic classification, unbalanced dataset, data augmentation, Variational AutoEncoder (VAE), ensemble learning, Adaptive Boosting (AdaBoost) algorithm

中图分类号:

TP309.2

李道全, 徐正, 陈思慧, 刘嘉宇. 融合变分自编码器与自适应增强卷积神经网络的网络流量分类模型[J]. 计算机应用, 2025, 45(6): 1841-1848.

Daoquan LI, Zheng XU, Sihui CHEN, Jiayu LIU. Network traffic classification model integrating variational autoencoder and AdaBoost-CNN[J]. Journal of Computer Applications, 2025, 45(6): 1841-1848.

图/表 13

图1 本文模型的结构

Fig. 1 Structure of proposed model

图2 模型结构

Fig. 2 Model structure

图3 VAE模块结构

Fig.3 VAE module structure

图4 AdaBoost算法的模型结构

Fig.4 Model structure of AdaBoost algorithm

表1 ISCX VPN-NON VPN数据集中的12类

Tab. 1 Twelve categories in ISCX VPN nonVPN dataset

流量类别	流量类型	流量包大小/MB
non-VPN	Email	7.85
	Chat	34.60
	Streaming	2 826.20
	File Transfer	17 715.20
	P2P	96.80
	VoIP	4.48
VPN	VPN-Email	7.80
	VPN-Chat	27.60
	VPN-Streaming	1.37
	VPN-File Transfer	279.00
	VPN-P2P	358.00
	VPN-VoIP	360.00

图5 流量的可视化

Fig. 5 Visualization of traffic

表2 各层对应的参数配置

Tab. 2 Parameter setting corresponding to each layer

序号	网络层	配置
1	1D Convolution	32 filters，3×1 kernel 和 ReLU
2	Max-Pooling	2×1 kernel
3	Dropout	20%
4	Fully connected	128 Neurons，ReLU
5	Dropout	20%
6	Fully connected	64 Neurons，ReLU
7	Fully connected	3 Neurons， Softmax

表3 对模型的消融实验结果

Tab. 3 Model ablation experiment results

模型	准确率/%	计算时间/s
VAE-ABC-7layers	96.53	426.11
VAE-ABC-5layers	95.58	389.37
VAE-H1	62.22	105.93
ABC-5layers	94.31	1 974.61
ABC-7layers	95.61	2 447.65
H1	47.16	34.61

图6 分类器数对准确率的影响

Fig. 6 Influence of number of classifiers on accuracy

表4 不同数量分类器下的模型对比实验结果

Tab. 4 Experimental results of model comparison under different numbers of classifiers

分类器数	训练准确率/%	测试准确率/%
8	94.41	93.28
10	95.89	94.00
12	96.02	94.15
15	95.43	93.62

表5 不同模型在ISCX VPN-nonVPN数据集上的训练与测试效果 (%)

Tab. 5 Training and testing effects of different models on ISCX VPN-nonVPN dataset

模型	训练准确率	测试准确率
AdaBoost-CNN	96.02	94.15
AdaBoost-D-T	91.78	77.08
1DCNN-5Epochs	95.00	91.35
1DCNN-10Epochs	95.67	92.18
1DCNN-15Epochs	94.84	91.26

图7 单个CNN的测试准确率与学习周期的关联

Fig.7 Correlation between single-CNN test accuracy and learning cycle

表6 不同模型在ISCX VPN-nonVPN数据集上的测试结果 (%)

Tab. 6 Test results of different models on ISCX VPN-nonVPN dataset

模型	训练准确率	测试准确率
VAE-ABC	96.53	94.31
AdaBoost-CNN	96.02	94.15
1DCNN-10Epochs	95.67	92.18
CNN-Weighted	95.87	93.24
CNN-Loss	95.11	91.59
AdaBoost-SVM	94.24	92.97
SMOTE-SVM	95.88	93.68
SMOTE-AdaBoost-DT	96.21	94.07
AdaBoost-D-T	91.78	77.08
ResNet	93.46	82.05

参考文献 26

1	REZAEI S， LIU X. Deep learning for encrypted traffic classification： an overview［J］. IEEE Communications Magazine， 2019， 57（5）： 76-81.
2	于治平，刘彩霞，刘树新，等. 基于机器学习的网络流量分类综述［J］. 信息工程大学学报， 2023， 24（4）：447-453， 483.
	YU Z P， LIU C X， LIU S X， et al. Overview of network traffic classification based on machine learning ［J］. Journal of Information Engineering University， 2023， 24（4）：447-453， 483.
3	王和勇，樊泓坤，姚正安，等. 不平衡数据集的分类方法研究［J］. 计算机应用研究， 2008， 25（5）： 1301-1303， 1308.
	WANG H Y， FAN H K， YAO Z A， et al. Research of imbalanced data classification［J］. Application Research of Computers， 2008， 25（5）： 1301-1303， 1308.
4	LEE W， JUN C H， LEE J S. Instance categorization by support vector machines to adjust weights in AdaBoost for imbalanced data classification［J］. Information Sciences， 2017， 381：92-103.
5	GALAR M， FERNÁNDEZ A， BARRENECHEA E， et al. Ordering-based pruning for improving the performance of ensembles of classifiers in the framework of imbalanced datasets［J］. Information Sciences， 2016， 354：178-196.
6	HU X， GU C， WEI F. CLD-Net： a network combining CNN and LSTM for Internet encrypted traffic classification ［J］. Security and Communication Networks， 2021， 2021： No.5518460.
7	刘丹，姚立霜，王云锋，等. 面向类不平衡流量数据的分类模型［J］. 计算机应用， 2020， 40（8）： 2327-2333.
	LIU D， YAO L S， WANG Y F， et al. Classification model for class imbalanced traffic data ［J］. Journal of Computer Applications， 2020， 40（8）： 2327-2333.
8	BUDA M， MAKI A， MAZUROWSKI M A. A systematic study of the class imbalance problem in convolutional neural networks［J］. Neural Networks， 2018， 106：249-259.
9	SUN Z， SONG Q， ZHU X， et al. A novel ensemble method for classifying imbalanced data［J］. Pattern Recognition， 2015， 48（5）：1623-1637.
10	VUCETIC S， OBRADOVIC Z. Classification on data with biased class distribution［C］// Proceedings of the 12th European Conference on Machine Learning， LNCS 2167. Berlin： Springer， 2001： 527-538.
11	REZAEI A， YAZDINEJAD M， SOOKHAK M. Credit card fraud detection using tree-based algorithms for highly imbalanced data［C］// Proceedings of the IEEE 3rd International Conference on Computing and Machine Intelligence. Piscataway： IEEE， 2024：1-6.
12	NAMKOONG H， DUCHI J C. Variance-based regularization with convex objectives［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2017：2975-2984.
13	邹腾宽，汪钰颖，吴承荣. 网络背景流量的分类与识别研究综述［J］. 计算机应用， 2019， 39（3）：802-811.
	ZOU T K， WANG Y Y， WU C R. Review of network background traffic classification and identification［J］. Journal of Computer Applications， 2019， 39（3）： 802-811.
14	WANG W， ZHU M， ZENG X， et al. Malware traffic classification using convolutional neural network for representation learning［C］// Proceedings of the 2017 International Conference on Information Networking. Piscataway： IEEE， 2017： 712-717.
15	CIREAN D， MEIER U， SCHMIDHUBER J. Multi-column deep neural networks for image classification［C］// Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2012：3642-3649.
16	FRAZÃO X， ALEXANDRE L A. Weighted convolutional neural network ensemble［C］// Proceedings of the 2014 Iberoamerican Congress on Pattern Recognition， LNCS 8827. Cham： Springer， 2014：674-681.
17	TAHERKHANI A， COSMA G， McGINNITY T M. AdaBoost-CNN： an adaptive boosting algorithm for convolutional neural networks to classify multi-class imbalanced datasets using transfer learning ［J］. Neurocomputing， 2020， 404：351-366.
18	KRUSE L E， KÜHL S， DOCHHAN A， et al. Monitoring data augmentation of spectral information using VAE and GAN for soft-failure identification ［C］// Proceedings of the 2024 Optical Fiber Communications Conference and Exhibition. Piscataway： IEEE， 2024：1-3.
19	LYGERAKIS F， RUECKERT E. CR-VAE： contrastive regularization on variational autoencoders for preventing posterior collapse［C］// Proceedings of the 7th Asian Conference on Artificial Intelligence Technology. Piscataway： IEEE， 2023：427-437.
20	谢胜利，陈泓达，高军礼，等. 基于分布对齐变分自编码器的深度多视图聚类［J］. 计算机学报， 2023， 46（5）：945-959.
	XIE S L， CHEN H D， GAO J L， et al. Deep multi-view clustering based on distribution aligned variational autoencoder［J］. Chinese Journal of Computers， 2023， 46（5）：945-959.
21	DRAPER-GIL G， LASHKARI A H， MAMUN M S I， et al. Characterization of encrypted and VPN traffic using time-related features［C］// Proceedings of the 2nd International Conference on Information Systems Security and Privacy. Setúbal： SciTePress， 2016： 407-414.
22	WANG W， ZHU M， WANG J， et al. End-to-end encrypted traffic classification with one-dimensional convolution neural networks［C］// Proceedings of the 2017 IEEE International Conference on Intelligence and Security Informatics. Piscataway： IEEE， 2017：43-48.
23	MIYAKE N， TAKIGUCHI T， ARIKI Y， et al. Noise detection with multi-class AdaBoost［EB/OL］. ［2024-04-21］..
24	边玉婵. 基于SVM和AdaBoost的网络入侵检测方法研究［D］. 沈阳：沈阳工业大学， 2022：35-43.
	BIAN Y C. Research on network intrusion detection method based on SVM and AdaBoost［D］. Shenyang： Shenyang University of Technology， 2022：35-43.
25	吴海燕，陈晓磊，范国轩. 一种自适应核SMOTE-SVM算法用于不平衡数据分类［J］. 北京化工大学学报（自然科学版）， 2023， 50（2）：97-104.
	WU H Y， CHEN X L， FAN G X. An adaptive kernel SMOTE-SVM algorithm for imbalanced data classification［J］. Journal of Beijing University of Chemical Technology （Natural Science Edition）， 2023， 50（2）：97-104.
26	赵佳丽，徐明江，吴增源，等. 基于SMOTE-AdaBoost-DT的类别不平衡信用评分模型［J］. 中国计量大学学报， 2021， 32（4）：549-554.
	ZHAO J L， XU M J， WU Z Y， et al. A SMOTE-AdaBoost-DT model for credit scoring［J］. Journal of China University of Metrology， 2021， 32（4）：549-554.

[1]	姜超英, 李倩, 刘宁, 刘磊, 崔立真. 基于图对比学习的再入院预测模型[J]. 《计算机应用》唯一官方网站, 2025, 45(6): 1784-1792.
[2]	何玉林, 李旭, 贺颖婷, 崔来中, 黄哲学. 基于最大均值差异的子空间高斯混合模型聚类集成算法[J]. 《计算机应用》唯一官方网站, 2025, 45(6): 1712-1723.
[3]	李雪莹, 杨琨, 涂国庆, 刘树波. 基于局部增强的时序数据对抗样本生成方法[J]. 《计算机应用》唯一官方网站, 2025, 45(5): 1573-1581.
[4]	田仁杰, 景明利, 焦龙, 王飞. 基于混合负采样的图对比学习推荐算法[J]. 《计算机应用》唯一官方网站, 2025, 45(4): 1053-1060.
[5]	孙海涛, 林佳瑜, 梁祖红, 郭洁. 结合标签混淆的中文文本分类数据增强技术[J]. 《计算机应用》唯一官方网站, 2025, 45(4): 1113-1119.
[6]	盛坤, 王中卿. 基于大语言模型和数据增强的通感隐喻分析[J]. 《计算机应用》唯一官方网站, 2025, 45(3): 794-800.
[7]	陈瑞龙, 胡涛, 卜佑军, 伊鹏, 胡先君, 乔伟. 面向加密恶意流量检测模型的堆叠集成对抗防御方法[J]. 《计算机应用》唯一官方网站, 2025, 45(3): 864-871.
[8]	孙晨伟, 侯俊利, 刘祥根, 吕建成. 面向工程图纸理解的大语言模型提示生成方法[J]. 《计算机应用》唯一官方网站, 2025, 45(3): 801-807.
[9]	洪梓榕, 包广清. 基于集成学习的雷达自动目标识别综述[J]. 《计算机应用》唯一官方网站, 2025, 45(2): 371-382.
[10]	严雪文, 黄章进. 基于对比学习的小样本图像分类方法[J]. 《计算机应用》唯一官方网站, 2025, 45(2): 383-391.
[11]	富坤, 应世聪, 郑婷婷, 屈佳捷, 崔静远, 李建伟. 面向小样本节点分类的图数据增强方法[J]. 《计算机应用》唯一官方网站, 2025, 45(2): 392-402.
[12]	张嘉琳, 任庆桦, 毛启容. 利用全局-局部特征依赖的反欺骗说话人验证系统[J]. 《计算机应用》唯一官方网站, 2025, 45(1): 308-317.
[13]	杨莹, 郝晓燕, 于丹, 马垚, 陈永乐. 面向图神经网络模型提取攻击的图数据生成方法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2483-2492.
[14]	汪炅, 唐韬韬, 贾彩燕. 无负采样的正样本增强图对比学习推荐方法PAGCL[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1485-1492.
[15]	郭洁, 林佳瑜, 梁祖红, 罗孝波, 孙海涛. 基于知识感知和跨层次对比学习的推荐方法[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1121-1127.

融合变分自编码器与自适应增强卷积神经网络的网络流量分类模型

Network traffic classification model integrating variational autoencoder and AdaBoost-CNN

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 13

参考文献 26

相关文章 15

编辑推荐

Metrics