计算机应用 ›› 2018, Vol. 38 ›› Issue (1): 20-25.DOI: 10.11772/j.issn.1001-9081.2017071812

• 2017年全国开放式分布与并行计算学术年会(DPCS 2017)论文 • 上一篇    下一篇

非平衡网络流量识别方法

燕昺昊, 韩国栋, 黄雅静, 王孝龙   

  1. 国家数字交换系统工程技术研究中心, 郑州 450002
  • 收稿日期:2017-07-24 修回日期:2017-08-01 出版日期:2018-01-10 发布日期:2018-01-22
  • 通讯作者: 燕昺昊
  • 作者简介:燕昺昊(1994-),男,山西吕梁人,硕士研究生,CCF会员,主要研究方向:流量识别、入侵检测、协议解析;韩国栋(1964-),男,山东莱西人,副教授,博士,主要研究方向:宽带信息处理与信息安全、芯片设计与应用;黄雅静(1984-),女,湖南长沙人,助理研究员,博士,主要研究方向:芯片设计、信号处理;王孝龙(1993-),男,河南民权人,硕士研究生,主要研究方向:宽带信息网络、协议解析。
  • 基金资助:
    国家科技重大专项(2016ZX01012101);国家自然科学基金面上项目(61572520);国家自然科学基金创新群体项目(61521003)。

New traffic classification method for imbalanced network data

YAN Binghao, HAN Guodong, HUANG Yajing, WANG Xiaolong   

  1. National Digital Switching System Engineering & Technological Research Center, Zhengzhou Henan 450002, China
  • Received:2017-07-24 Revised:2017-08-01 Online:2018-01-10 Published:2018-01-22
  • Supported by:
    This work is partially supported by the National Science Technology Major Project of China (2016ZX01012101), the National Natural Science Foundation of China (61572520), the National Natural Science Foundation Innovation Group Project of China (61521003).

摘要: 针对网络中存在的对等网络(P2P)流量泛滥导致的流量失衡问题,提出将非平衡数据分类思想应用于流量识别过程。通过引入合成少数类过采样技术(SMOTE)算法并进行改进,提出了均值SMOTE (M-SMOTE)算法,实现对流量数据的平衡化处理。在此基础上分别采用3种机器学习分类器:随机森林(RF)、支持向量机(SVM)、反向传播神经网络(BPNN)对处理后各类流量进行识别。理论分析与仿真结果表明,在不影响P2P流量识别准确率的前提下,与非平衡状态相比,引入SMOTE算法将非P2P流量的识别准确率平均提高了16.5个百分点,将网络流量的整体识别率提高了9.5个百分点;与SMOTE算法相比,M-SMOTE算法将非P2P流量的识别准确率与网络流量的整体识别率分别进一步提高了3.2个百分点和2.6个百分点。实验结果表明,非平衡数据分类思想可有效解决P2P流量过多导致的非P2P流量识别率低的问题,同时所提M-SMOTE算法具有更高的识别准确度。

关键词: 非平衡数据, P2P流量, 流量识别, 机器学习, 合成少数类过采样技术算法

Abstract: To solve the problem existing in traffic classification that Peer-to-Peer (P2P) traffic is much more than that of non-P2P, a new traffic classification method for imbalanced network data was presented. By introducing and improving Synthetic Minority Over-sampling Technique (SMOTE) algorithm, a Mean SMOTE (M-SMOTE) algorithm was proposed to realize the balance of traffic data. On the basis of this, three kinds of machine learning classifiers:Random Forest (RF), Support Vector Machine (SVM), Back Propagation Neural Network (BPNN) were used to identify the various types of traffic. The theoretical analysis and simulation results show that, compared with the imbalanced state, the SMOTE algorithm improves the recognition accuracy of non-P2P traffic by 16.5 percentage points and raises the overall recognition rate of network traffic by 9.5 percentage points. Compared with SMOTE algorithm, the M-SMOTE algorithm further improves the recognition rate of non-P2P traffic and the overall recognition rate of network traffic by 3.2 percentage points and 2.6 percentage points respectively. The experimental results show that the way of imbalanced data classification can effectively solve the problem of low P2P traffic recognition rate caused by excessive P2P traffic, and the M-SMOTE algorithm has higher recognition accuracy rate than SMOTE.

Key words: imbalanced data, Peer-to-Peer (P2P) traffic, traffic classification, machine learning, Synthetic Minority Over sampling Technique (SMOTE) algorithm

中图分类号: