• •    

DPCS2017+41+非平衡网络流量识别方法研究

燕昺昊1,韩国栋2,黄雅静1,王孝龙1   

  1. 1. 国家数字交换系统工程技术研究中心
    2. 国家数字交换系统工程技术研究中心,郑州 450002
  • 收稿日期:2017-07-24 修回日期:2017-07-28 发布日期:2017-07-28
  • 通讯作者: 燕昺昊

DPCS2017+41+A Novel traffic Classification Method Based on Imbalanced Data

  • Received:2017-07-24 Revised:2017-07-28 Online:2017-07-28
  • Contact: Bing-Hao YAN

摘要: 摘 要: 针对网络中存在的P2P(Peer-to-Peer)流量泛滥导致的流量失衡问题,将非平衡数据分类思想应用于流量识别过程。通过引入Synthetic Minority Over-sampling TEchnique (SMOTE)算法并进行改进,提出了均值SMOTE算法(M-SMOTE),实现对流量数据的平衡化处理。在此基础上分别采用三种机器学习分类器:随机森林(Random Forest,RF),支持向量机(Support Vector Machine,SVM),BP神经网络(Back Propagation Neural Network)对处理后各类流量进行识别。实验结果表明,在不影响P2P流量识别准确率的前提下,M-SMOTE算法将非P2P流量的识别准确率平均提高了19.8%,同时将网络流量的整体识别率提高了12.1%,解决了P2P流量过多导致的非P2P流量识别率低的问题。

关键词: 关键词: 非平衡数据, 流量识别, 机器学习, SMOTE算法

Abstract: Abstract: In order to solve the problem exist in traffic classification that P2P traffic much more than nonP2P , a novel traffic classification method on imbalanced data was presented. In dealing with imbalanced traffic in this paper a new method named Mean Synthetic Minority Over-sampling Technique (M-SMOTE) was applied to the traffic classification. Then three classifiers include Random Forest (RF), Support Vector Machine (SVM), Back Propagation Neural Network (BPNN) were used to test the accuracy of the M-SMOTE’s results. Experimental results show that with the M-SMOTE, the accuracy of the nonP2P class has been improved by 19.8% and the accuracy of the all imbalanced data has been improved by 12.1%.

Key words: Keywords: imbalanced data, traffic classification, machine learning, Synthetic Minority Over-sampling Technique(SMOTE)

中图分类号: