Journal of Computer Applications
Next Articles
Received:
Revised:
Online:
Published:
李道全,徐正,陈思慧,刘嘉宇
通讯作者:
Abstract: The problem of network traffic classification has always been a challenge of iterative methods with the development of network communication, and many solutions have been developed. At present, most network data classification focus on the balanced data set to facilitate experiment and calculation. To solve the problem that most real network data sets are still unbalanced, a model VAE-ABC integrating Variational AutoEncoder (VAE) and Adaptive Boosting Convolutional Neural Network (AdaBoost-CNN) was proposed. Firstly, the VAE was used to partially enhance the unbalanced data set at the data level, and shorten the learning time with the characteristics of the potential distribution of the learning data. Then, in order to improve the classification effect, combining with the idea of integrated learning, an AdaBoost-CNN model was designed based on the AdaBoost, which promoted the accuracy of learning and training. Finally, the fully connection layer was used, and then the final classification result was obtained through an activation function sigmoid. After multiple contrast, the experimental results show that the model achieved an accuracy of 94.31% on the subset of unbalanced data after the ISCX VPN-nonVPN 2016 partition of the classification dataset. Compared with AdaBoost-SVM, using the Support Vector Machine (SVM) as a weak classifier, the SMOTE-SVM combined with Synthetic Minority Oversampling Technique(SMOTE) and the SVM , and SMOTE-AB-D-T, which used Decision Tree(D-T) as a weak classifier and combined with SMOTE algorithm, the accuracy is increased by 1.18percentage points, 0.47pencent point, and 0.08persent point. Respectively, it can be seen that the classification effect of this model is better than other models on this data set.
Key words: network traffic classification, unbalanced data set, data augmentation, Variational AutoEncoder(VAE), ensemble learning, Adaptive Boosting(AdaBoost).
摘要: 网络流量分类问题一直是一种随着网络通讯发展而不断迭代方法的难题,发展至今已有多种解决方法,目前对网络数据进行分类时大多会将目光聚集在种类均衡的数据集上以便于实验和计算。针对大部分现实网络数据集仍不平衡的问题,提出了一种融合变分自编码器(VAE)与自适应增强卷积网络(AdaBoost-CNN)的网络流量分类模型VAE-ABC。首先在数据层面使用VAE对不平衡数据集进行部分增强,并利用其学习数据潜在分布的特性缩短学习时间;然后在算法层面提高分类效果,结合集成学习的思想,以AdaBoost为基设计出了一种使用改进的CNN作为弱分类器的AdaBoost-CNN算法,促进学习和训练的准确率;最后使用全连接层完成特征映射,通过激活函数sigmoid获得最后分类结果。多重对比实验结果表明,该模型在分类数据集ISCX VPN-nonVPN 2016划分后的不平衡数据子集上的准确率达到了94.31%,对比使用支持向量机(SVM)做为弱分类器的AdaBoost-SVM、使用SMOTE(Synthetic Minority Oversampling Technique)算法和SVM结合的SMOTE-SVM、使用决策树(Decision Tree,D-T)作为弱分类器并与SMOTE算法结合的SMOTE-AB-D-T来看,准确率分别提高了1.18个百分点、0.47个百分点、0.08个百分点。可见,本模型在该数据集上的分类效果要优于其他模型。
关键词: 网络流量分类, 不平衡数据集, 数据增强, 变分自编码器, 集成学习, 自适应增强算法。
CLC Number:
TP309.2
李道全 徐正 陈思慧 刘嘉宇. 融合变分自编码器与自适应增强卷积神经网络的网络流量分类模型[J]. 《计算机应用》唯一官方网站, DOI: 10.11772/j.issn.1001-9081.2024060840.
0 / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2024060840