Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (6): 1841-1848.DOI: 10.11772/j.issn.1001-9081.2024060840

• Artificial intelligence • Previous Articles    

Network traffic classification model integrating variational autoencoder and AdaBoost-CNN

Daoquan LI, Zheng XU(), Sihui CHEN, Jiayu LIU   

  1. School of Information and Control Engineering,Qingdao University of Technology,Qingdao Shandong 266520,China
  • Received:2024-06-24 Revised:2024-09-09 Accepted:2024-09-10 Online:2024-09-25 Published:2025-06-10
  • Contact: Zheng XU
  • About author:LI Daoquan, born in 1967, Ph. D., professor. His research interests include internet of things, software defined network, network security, electronic commerce.
    XU Zheng, born in 2001, M. S. candidate. His research interests include network security, machine learning, deep learning, traffic classification, software defined network.
    CHEN Sihui, born in 2001, M. S. candidate. Her research interests include intrusion detection, network security, data mining.
    LIU Jiayu, born in 2000, M. S. candidate. His research interests include network security, deep learning, traffic classification.
  • Supported by:
    Shandong Provincial Natural Science Foundation(ZR2023MF052)

融合变分自编码器与自适应增强卷积神经网络的网络流量分类模型

李道全, 徐正(), 陈思慧, 刘嘉宇   

  1. 青岛理工大学 信息与控制工程学院,山东 青岛 266520
  • 通讯作者: 徐正
  • 作者简介:李道全(1967—),男,山东日照人,教授,博士,CCF会员,主要研究方向:物联网、软件定义网络、网络安全、电子商务
    徐正(2001—),男,江西抚州人,硕士研究生,主要研究方向:网络安全、机器学习、深度学习、流量分类、软件定义网络 1455080545@qq.com
    陈思慧(2001—),女,湖北孝感人,硕士研究生,主要研究方向:入侵检测、网络安全、数据挖掘
    刘嘉宇(2000 —),男,山东枣庄人,硕士研究生,主要研究方向:网络安全、深度学习、流量分类。
  • 基金资助:
    山东省自然科学基金面上项目(ZR2023MF052)

Abstract:

The problem of network traffic classification has always been a challenge of iterative methods with the development of network communication, and many solutions have been developed. At present, most network data classification methods focus on the balanced dataset to facilitate experiment and calculation. To solve the problem that most real network datasets are still unbalanced, a network traffic classification model VAE-ABC (Variational AutoEncoder- Adaptive Boosting-Convolutional neural network) was proposed by integrating Variational AutoEncoder (VAE) and Adaptive Boosting Convolutional Neural Network (AdaBoost-CNN). Firstly, at the data level, VAE was used to partially enhance the unbalanced dataset, and shorten the learning time with the VAE’s characteristics of learning data potential distribution. Then, in order to improve classification effect at the algorithm level, combining with the idea of ensemble learning, AdaBoost-CNN algorithm was designed on the basis of Adaptive Boosting (AdaBoost) algorithm with using an improved Convolutional Neural Network (CNN) as a weak classifier, thereby improving the accuracy of learning and training. Finally, the fully connected layer was used to complete feature mapping, and then the final classification results were obtained through an activation function Sigmoid. After multiple comparisons, experimental results show that the proposed model achieves an accuracy of 94.31% on the unbalanced sub-dataset of partitioned classification dataset ISCX VPN-nonVPN. Compared with AdaBoost-SVM, using Support Vector Machine (SVM) as a weak classifier, SMOTE-SVM, combining SMOTE (Synthetic Minority Oversampling TEchnique) and SVM, and SMOTE-AB-D-T, with Decision Tree (D-T) as a weak classifier and combined with SMOTE algorithm, the proposed model has the accuracy increased by 1.34, 0.63 and 0.24 percentage points, respectively. It can be seen that the classification effect of this model is better than those of other models on this dataset.

Key words: network traffic classification, unbalanced dataset, data augmentation, Variational AutoEncoder (VAE), ensemble learning, Adaptive Boosting (AdaBoost) algorithm

摘要:

网络流量分类问题一直是一种随着网络通信发展而不断迭代方法的难题,发展至今已有多种解决方法。目前对网络数据进行分类时大多数方法会将目光聚集在种类均衡的数据集上以便于实验和计算。针对大部分现实网络数据集仍不平衡的问题,提出一种融合变分自编码器(VAE)与自适应增强卷积神经网络(AdaBoost-CNN)的网络流量分类模型VAE-ABC (Variational AutoEncoder-Adaptive Boosting-Convolutional neural network)。首先,在数据层面使用VAE对不平衡数据集进行部分增强,并利用VAE学习数据潜在分布的特性缩短学习时间;其次,为了在算法层面提高分类效果,结合集成学习的思想,以自适应增强(AdaBoost)算法为基础设计一种使用改进的卷积神经网络(CNN)作为弱分类器的AdaBoost-CNN算法,从而提高学习和训练的准确率;最后,使用全连接层完成特征映射,并通过激活函数Sigmoid获得最终的分类结果。多重对比实验的结果表明,所提模型在分类数据集ISCX VPN-nonVPN划分后的不平衡子数据集上的准确率达到了94.31%,对比使用支持向量机(SVM)作为弱分类器的AdaBoost-SVM、使用SMOTE (Synthetic Minority Oversampling TEchnique)算法与SVM结合的SMOTE-SVM、使用决策树(D-T)作为弱分类器并与SMOTE算法结合的SMOTE-AB-D-T,所提模型的准确率分别提高了1.34、0.63和0.24个百分点。可见,所提模型在该数据集上的分类效果优于其他模型。

关键词: 网络流量分类, 不平衡数据集, 数据增强, 变分自编码器, 集成学习, 自适应增强算法

CLC Number: