Network traffic classification model integrating variational autoencoder and AdaBoost-CNN

doi:10.11772/j.issn.1001-9081.2024060840

Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (6): 1841-1848.DOI: 10.11772/j.issn.1001-9081.2024060840

• Artificial intelligence • Previous Articles

Network traffic classification model integrating variational autoencoder and AdaBoost-CNN

Daoquan LI, Zheng XU(), Sihui CHEN, Jiayu LIU

School of Information and Control Engineering，Qingdao University of Technology，Qingdao Shandong 266520，China

Received:2024-06-24 Revised:2024-09-09 Accepted:2024-09-10 Online:2024-09-25 Published:2025-06-10
Contact: Zheng XU
About author:LI Daoquan， born in 1967， Ph. D.， professor. His research interests include internet of things， software defined network， network security， electronic commerce.
XU Zheng， born in 2001， M. S. candidate. His research interests include network security， machine learning， deep learning， traffic classification， software defined network.
CHEN Sihui， born in 2001， M. S. candidate. Her research interests include intrusion detection， network security， data mining.
LIU Jiayu， born in 2000， M. S. candidate. His research interests include network security， deep learning， traffic classification.
Supported by:
Shandong Provincial Natural Science Foundation(ZR2023MF052)

融合变分自编码器与自适应增强卷积神经网络的网络流量分类模型

李道全, 徐正(), 陈思慧, 刘嘉宇

青岛理工大学信息与控制工程学院，山东青岛 266520

通讯作者: 徐正
作者简介:李道全（1967—），男，山东日照人，教授，博士，CCF会员，主要研究方向：物联网、软件定义网络、网络安全、电子商务
徐正（2001—），男，江西抚州人，硕士研究生，主要研究方向：网络安全、机器学习、深度学习、流量分类、软件定义网络 1455080545@qq.com
陈思慧（2001—），女，湖北孝感人，硕士研究生，主要研究方向：入侵检测、网络安全、数据挖掘
刘嘉宇（2000 —），男，山东枣庄人，硕士研究生，主要研究方向：网络安全、深度学习、流量分类。
基金资助:
山东省自然科学基金面上项目(ZR2023MF052)

Abstract

Abstract:

The problem of network traffic classification has always been a challenge of iterative methods with the development of network communication， and many solutions have been developed. At present， most network data classification methods focus on the balanced dataset to facilitate experiment and calculation. To solve the problem that most real network datasets are still unbalanced， a network traffic classification model VAE-ABC （Variational AutoEncoder- Adaptive Boosting-Convolutional neural network） was proposed by integrating Variational AutoEncoder （VAE） and Adaptive Boosting Convolutional Neural Network （AdaBoost-CNN）. Firstly， at the data level， VAE was used to partially enhance the unbalanced dataset， and shorten the learning time with the VAE’s characteristics of learning data potential distribution. Then， in order to improve classification effect at the algorithm level， combining with the idea of ensemble learning， AdaBoost-CNN algorithm was designed on the basis of Adaptive Boosting （AdaBoost） algorithm with using an improved Convolutional Neural Network （CNN） as a weak classifier， thereby improving the accuracy of learning and training. Finally， the fully connected layer was used to complete feature mapping， and then the final classification results were obtained through an activation function Sigmoid. After multiple comparisons， experimental results show that the proposed model achieves an accuracy of 94.31% on the unbalanced sub-dataset of partitioned classification dataset ISCX VPN-nonVPN. Compared with AdaBoost-SVM， using Support Vector Machine （SVM） as a weak classifier， SMOTE-SVM， combining SMOTE （Synthetic Minority Oversampling TEchnique） and SVM， and SMOTE-AB-D-T， with Decision Tree （D-T） as a weak classifier and combined with SMOTE algorithm， the proposed model has the accuracy increased by 1.34， 0.63 and 0.24 percentage points， respectively. It can be seen that the classification effect of this model is better than those of other models on this dataset.

Key words: network traffic classification, unbalanced dataset, data augmentation, Variational AutoEncoder (VAE), ensemble learning, Adaptive Boosting (AdaBoost) algorithm

摘要：

网络流量分类问题一直是一种随着网络通信发展而不断迭代方法的难题，发展至今已有多种解决方法。目前对网络数据进行分类时大多数方法会将目光聚集在种类均衡的数据集上以便于实验和计算。针对大部分现实网络数据集仍不平衡的问题，提出一种融合变分自编码器（VAE）与自适应增强卷积神经网络（AdaBoost-CNN）的网络流量分类模型VAE-ABC （Variational AutoEncoder-Adaptive Boosting-Convolutional neural network）。首先，在数据层面使用VAE对不平衡数据集进行部分增强，并利用VAE学习数据潜在分布的特性缩短学习时间；其次，为了在算法层面提高分类效果，结合集成学习的思想，以自适应增强（AdaBoost）算法为基础设计一种使用改进的卷积神经网络（CNN）作为弱分类器的AdaBoost-CNN算法，从而提高学习和训练的准确率；最后，使用全连接层完成特征映射，并通过激活函数Sigmoid获得最终的分类结果。多重对比实验的结果表明，所提模型在分类数据集ISCX VPN-nonVPN划分后的不平衡子数据集上的准确率达到了94.31%，对比使用支持向量机（SVM）作为弱分类器的AdaBoost-SVM、使用SMOTE （Synthetic Minority Oversampling TEchnique）算法与SVM结合的SMOTE-SVM、使用决策树（D-T）作为弱分类器并与SMOTE算法结合的SMOTE-AB-D-T，所提模型的准确率分别提高了1.34、0.63和0.24个百分点。可见，所提模型在该数据集上的分类效果优于其他模型。

关键词: 网络流量分类, 不平衡数据集, 数据增强, 变分自编码器, 集成学习, 自适应增强算法

CLC Number:

TP309.2

Daoquan LI, Zheng XU, Sihui CHEN, Jiayu LIU. Network traffic classification model integrating variational autoencoder and AdaBoost-CNN[J]. Journal of Computer Applications, 2025, 45(6): 1841-1848.

李道全, 徐正, 陈思慧, 刘嘉宇. 融合变分自编码器与自适应增强卷积神经网络的网络流量分类模型[J]. 《计算机应用》唯一官方网站, 2025, 45(6): 1841-1848.

Figures/Tables 13

References 26

1	REZAEI S， LIU X. Deep learning for encrypted traffic classification： an overview［J］. IEEE Communications Magazine， 2019， 57（5）： 76-81.
2	于治平，刘彩霞，刘树新，等. 基于机器学习的网络流量分类综述［J］. 信息工程大学学报， 2023， 24（4）：447-453， 483.
	YU Z P， LIU C X， LIU S X， et al. Overview of network traffic classification based on machine learning ［J］. Journal of Information Engineering University， 2023， 24（4）：447-453， 483.
3	王和勇，樊泓坤，姚正安，等. 不平衡数据集的分类方法研究［J］. 计算机应用研究， 2008， 25（5）： 1301-1303， 1308.
	WANG H Y， FAN H K， YAO Z A， et al. Research of imbalanced data classification［J］. Application Research of Computers， 2008， 25（5）： 1301-1303， 1308.
4	LEE W， JUN C H， LEE J S. Instance categorization by support vector machines to adjust weights in AdaBoost for imbalanced data classification［J］. Information Sciences， 2017， 381：92-103.
5	GALAR M， FERNÁNDEZ A， BARRENECHEA E， et al. Ordering-based pruning for improving the performance of ensembles of classifiers in the framework of imbalanced datasets［J］. Information Sciences， 2016， 354：178-196.
6	HU X， GU C， WEI F. CLD-Net： a network combining CNN and LSTM for Internet encrypted traffic classification ［J］. Security and Communication Networks， 2021， 2021： No.5518460.
7	刘丹，姚立霜，王云锋，等. 面向类不平衡流量数据的分类模型［J］. 计算机应用， 2020， 40（8）： 2327-2333.
	LIU D， YAO L S， WANG Y F， et al. Classification model for class imbalanced traffic data ［J］. Journal of Computer Applications， 2020， 40（8）： 2327-2333.
8	BUDA M， MAKI A， MAZUROWSKI M A. A systematic study of the class imbalance problem in convolutional neural networks［J］. Neural Networks， 2018， 106：249-259.
9	SUN Z， SONG Q， ZHU X， et al. A novel ensemble method for classifying imbalanced data［J］. Pattern Recognition， 2015， 48（5）：1623-1637.
10	VUCETIC S， OBRADOVIC Z. Classification on data with biased class distribution［C］// Proceedings of the 12th European Conference on Machine Learning， LNCS 2167. Berlin： Springer， 2001： 527-538.
11	REZAEI A， YAZDINEJAD M， SOOKHAK M. Credit card fraud detection using tree-based algorithms for highly imbalanced data［C］// Proceedings of the IEEE 3rd International Conference on Computing and Machine Intelligence. Piscataway： IEEE， 2024：1-6.
12	NAMKOONG H， DUCHI J C. Variance-based regularization with convex objectives［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2017：2975-2984.
13	邹腾宽，汪钰颖，吴承荣. 网络背景流量的分类与识别研究综述［J］. 计算机应用， 2019， 39（3）：802-811.
	ZOU T K， WANG Y Y， WU C R. Review of network background traffic classification and identification［J］. Journal of Computer Applications， 2019， 39（3）： 802-811.
14	WANG W， ZHU M， ZENG X， et al. Malware traffic classification using convolutional neural network for representation learning［C］// Proceedings of the 2017 International Conference on Information Networking. Piscataway： IEEE， 2017： 712-717.
15	CIREAN D， MEIER U， SCHMIDHUBER J. Multi-column deep neural networks for image classification［C］// Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2012：3642-3649.
16	FRAZÃO X， ALEXANDRE L A. Weighted convolutional neural network ensemble［C］// Proceedings of the 2014 Iberoamerican Congress on Pattern Recognition， LNCS 8827. Cham： Springer， 2014：674-681.
17	TAHERKHANI A， COSMA G， McGINNITY T M. AdaBoost-CNN： an adaptive boosting algorithm for convolutional neural networks to classify multi-class imbalanced datasets using transfer learning ［J］. Neurocomputing， 2020， 404：351-366.
18	KRUSE L E， KÜHL S， DOCHHAN A， et al. Monitoring data augmentation of spectral information using VAE and GAN for soft-failure identification ［C］// Proceedings of the 2024 Optical Fiber Communications Conference and Exhibition. Piscataway： IEEE， 2024：1-3.
19	LYGERAKIS F， RUECKERT E. CR-VAE： contrastive regularization on variational autoencoders for preventing posterior collapse［C］// Proceedings of the 7th Asian Conference on Artificial Intelligence Technology. Piscataway： IEEE， 2023：427-437.
20	谢胜利，陈泓达，高军礼，等. 基于分布对齐变分自编码器的深度多视图聚类［J］. 计算机学报， 2023， 46（5）：945-959.
	XIE S L， CHEN H D， GAO J L， et al. Deep multi-view clustering based on distribution aligned variational autoencoder［J］. Chinese Journal of Computers， 2023， 46（5）：945-959.
21	DRAPER-GIL G， LASHKARI A H， MAMUN M S I， et al. Characterization of encrypted and VPN traffic using time-related features［C］// Proceedings of the 2nd International Conference on Information Systems Security and Privacy. Setúbal： SciTePress， 2016： 407-414.
22	WANG W， ZHU M， WANG J， et al. End-to-end encrypted traffic classification with one-dimensional convolution neural networks［C］// Proceedings of the 2017 IEEE International Conference on Intelligence and Security Informatics. Piscataway： IEEE， 2017：43-48.
23	MIYAKE N， TAKIGUCHI T， ARIKI Y， et al. Noise detection with multi-class AdaBoost［EB/OL］. ［2024-04-21］..
24	边玉婵. 基于SVM和AdaBoost的网络入侵检测方法研究［D］. 沈阳：沈阳工业大学， 2022：35-43.
	BIAN Y C. Research on network intrusion detection method based on SVM and AdaBoost［D］. Shenyang： Shenyang University of Technology， 2022：35-43.
25	吴海燕，陈晓磊，范国轩. 一种自适应核SMOTE-SVM算法用于不平衡数据分类［J］. 北京化工大学学报（自然科学版）， 2023， 50（2）：97-104.
	WU H Y， CHEN X L， FAN G X. An adaptive kernel SMOTE-SVM algorithm for imbalanced data classification［J］. Journal of Beijing University of Chemical Technology （Natural Science Edition）， 2023， 50（2）：97-104.
26	赵佳丽，徐明江，吴增源，等. 基于SMOTE-AdaBoost-DT的类别不平衡信用评分模型［J］. 中国计量大学学报， 2021， 32（4）：549-554.
	ZHAO J L， XU M J， WU Z Y， et al. A SMOTE-AdaBoost-DT model for credit scoring［J］. Journal of China University of Metrology， 2021， 32（4）：549-554.

流量类别	流量类型	流量包大小/MB
non-VPN	Email	7.85
	Chat	34.60
	Streaming	2 826.20
	File Transfer	17 715.20
	P2P	96.80
	VoIP	4.48
VPN	VPN-Email	7.80
	VPN-Chat	27.60
	VPN-Streaming	1.37
	VPN-File Transfer	279.00
	VPN-P2P	358.00
	VPN-VoIP	360.00

流量类别	流量类型	流量包大小/MB
non-VPN	Email	7.85
	Chat	34.60
	Streaming	2 826.20
	File Transfer	17 715.20
	P2P	96.80
	VoIP	4.48
VPN	VPN-Email	7.80
	VPN-Chat	27.60
	VPN-Streaming	1.37
	VPN-File Transfer	279.00
	VPN-P2P	358.00
	VPN-VoIP	360.00

序号	网络层	配置
1	1D Convolution	32 filters，3×1 kernel 和 ReLU
2	Max-Pooling	2×1 kernel
3	Dropout	20%
4	Fully connected	128 Neurons，ReLU
5	Dropout	20%
6	Fully connected	64 Neurons，ReLU
7	Fully connected	3 Neurons， Softmax

序号	网络层	配置
1	1D Convolution	32 filters，3×1 kernel 和 ReLU
2	Max-Pooling	2×1 kernel
3	Dropout	20%
4	Fully connected	128 Neurons，ReLU
5	Dropout	20%
6	Fully connected	64 Neurons，ReLU
7	Fully connected	3 Neurons， Softmax

模型	准确率/%	计算时间/s
VAE-ABC-7layers	96.53	426.11
VAE-ABC-5layers	95.58	389.37
VAE-H1	62.22	105.93
ABC-5layers	94.31	1 974.61
ABC-7layers	95.61	2 447.65
H1	47.16	34.61

Network traffic classification model integrating variational autoencoder and AdaBoost-CNN

融合变分自编码器与自适应增强卷积神经网络的网络流量分类模型

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 13

References 26

Related Articles 15

Recommended Articles

Metrics

模型	训练准确率	测试准确率
AdaBoost-CNN	96.02	94.15
AdaBoost-D-T	91.78	77.08
1DCNN-5Epochs	95.00	91.35
1DCNN-10Epochs	95.67	92.18
1DCNN-15Epochs	94.84	91.26

模型	训练准确率	测试准确率
VAE-ABC	96.53	94.31
AdaBoost-CNN	96.02	94.15
1DCNN-10Epochs	95.67	92.18
CNN-Weighted	95.87	93.24
CNN-Loss	95.11	91.59
AdaBoost-SVM	94.24	92.97
SMOTE-SVM	95.88	93.68
SMOTE-AdaBoost-DT	96.21	94.07
AdaBoost-D-T	91.78	77.08
ResNet	93.46	82.05

[1]	Yulin HE, Xu LI, Yingting HE, Laizhong CUI, Zhexue HUANG. Subspace Gaussian mixture model clustering ensemble algorithm based on maximum mean discrepancy [J]. Journal of Computer Applications, 2025, 45(6): 1712-1723.
[2]	Chaoying JIANG, Qian LI, Ning LIU, Lei LIU, Lizhen CUI. Readmission prediction model based on graph contrastive learning [J]. Journal of Computer Applications, 2025, 45(6): 1784-1792.
[3]	Xueying LI, Kun YANG, Guoqing TU, Shubo LIU. Adversarial sample generation method for time-series data based on local augmentation [J]. Journal of Computer Applications, 2025, 45(5): 1573-1581.
[4]	Renjie TIAN, Mingli JING, Long JIAO, Fei WANG. Recommendation algorithm of graph contrastive learning based on hybrid negative sampling [J]. Journal of Computer Applications, 2025, 45(4): 1053-1060.
[5]	Haitao SUN, Jiayu LIN, Zuhong LIANG, Jie GUO. Data augmentation technique incorporating label confusion for Chinese text classification [J]. Journal of Computer Applications, 2025, 45(4): 1113-1119.
[6]	Chenwei SUN, Junli HOU, Xianggen LIU, Jiancheng LYU. Large language model prompt generation method for engineering drawing understanding [J]. Journal of Computer Applications, 2025, 45(3): 801-807.
[7]	Ruilong CHEN, Tao HU, Youjun BU, Peng YI, Xianjun HU, Wei QIAO. Stacking ensemble adversarial defense method for encrypted malicious traffic detection model [J]. Journal of Computer Applications, 2025, 45(3): 864-871.
[8]	Zirong HONG, Guangqing BAO. Review of radar automatic target recognition based on ensemble learning [J]. Journal of Computer Applications, 2025, 45(2): 371-382.
[9]	Xuewen YAN, Zhangjin HUANG. Few-shot image classification method based on contrast learning [J]. Journal of Computer Applications, 2025, 45(2): 383-391.
[10]	Kun FU, Shicong YING, Tingting ZHENG, Jiajie QU, Jingyuan CUI, Jianwei LI. Graph data augmentation method for few-shot node classification [J]. Journal of Computer Applications, 2025, 45(2): 392-402.
[11]	Jialin ZHANG, Qinghua REN, Qirong MAO. Speaker verification system utilizing global-local feature dependency for anti-spoofing [J]. Journal of Computer Applications, 2025, 45(1): 308-317.
[12]	Jiong WANG, Taotao TANG, Caiyan JIA. PAGCL： positive augmentation graph contrastive learning recommendation method without negative sampling [J]. Journal of Computer Applications, 2024, 44(5): 1485-1492.
[13]	Jie GUO, Jiayu LIN, Zuhong LIANG, Xiaobo LUO, Haitao SUN. Recommendation method based on knowledge‑awareness and cross-level contrastive learning [J]. Journal of Computer Applications, 2024, 44(4): 1121-1127.
[14]	Zongyu LI, Siwei QIANG, Xiaobo GUO, Zhenfeng ZHU. Re-weighted adversarial variational autoencoder and its application in industrial causal effect estimation [J]. Journal of Computer Applications, 2024, 44(4): 1099-1106.
[15]	Andi GUO, Zhen JIA, Tianrui LI. High-precision entity and relation extraction in medical domain based on pseudo-entity data augmentation [J]. Journal of Computer Applications, 2024, 44(2): 393-402.

分类器数	训练准确率/%	测试准确率/%
8	94.41	93.28
10	95.89	94.00
12	96.02	94.15
15	95.43	93.62

分类器数	训练准确率/%	测试准确率/%
8	94.41	93.28
10	95.89	94.00
12	96.02	94.15
15	95.43	93.62