计算机应用 ›› 2019, Vol. 39 ›› Issue (3): 802-811.DOI: 10.11772/j.issn.1001-9081.2018071552

• 网络与通信 • 上一篇    下一篇

网络背景流量的分类与识别研究综述

邹腾宽1,2, 汪钰颖1,2, 吴承荣1,2   

  1. 1. 复旦大学 计算机科学技术学院, 上海 200433;
    2. 网络信息安全审计与监控教育部工程研究中心, 上海 200433
  • 收稿日期:2018-07-26 修回日期:2018-11-08 出版日期:2019-03-10 发布日期:2019-03-11
  • 作者简介:邹腾宽(1995-),男,辽宁沈阳人,硕士研究生,主要研究方向:网络安全;汪钰颖(1995-),女,安徽芜湖人,硕士研究生,主要研究方向:网络安全;吴承荣(1971-),男,上海人,副教授,博士,CCF会员,主要研究方向:信息安全。
  • 基金资助:
    国家重点研发计划项目(2017YFB0803203)。

Review of network background traffic classification and identification

ZOU Tengkuan1,2, WANG Yuying1,2, WU Chengrong1,2   

  1. 1. School of Computer Science, Fudan University, Shanghai 200433, China;
    2. Engineering Research Center of Cyber Security Auditing and Monitoring, Ministry of Education, Shanghai 200433, China
  • Received:2018-07-26 Revised:2018-11-08 Online:2019-03-10 Published:2019-03-11
  • Contact: 邹腾宽
  • Supported by:
    This work is partially supported by the National Key Research and Development Program of China (2017YFB0803203).

摘要: 互联网流量分类是识别网络应用和分类相应流量的过程,这被认为是现代网络管理和安全系统中最基本的功能。与应用相关的流量分类是网络安全的基础技术。传统的流量分类方法包括基于端口的预测方法和基于有效载荷的深度检测方法。在目前的网络环境下,传统的方法存在一些实际问题,如动态端口和加密应用,因此采用基于流量统计特征的机器学习(ML)技术来进行流量分类识别。机器学习可以利用提供的流量数据进行集中自动搜索,并描述有用的结构模式,这有助于智能地进行流量分类。起初使用朴素贝叶斯方法进行网络流量分类的识别和分类,对特定流量进行实验时,表现较好,准确度可达90%以上,但对点对点传输网络流量(P2P)等流量识别准确度仅能达到50%左右。然后有使用支持向量机(SVM)和神经网络(NN)等方法,神经网络方法使整体网络流量的分类准确度能达到80%以上。多项研究结果表明,对于多种机器学习方法的使用和后续的改进,很好地提高了流量分类的准确性。

关键词: 流量分类, 背景流量, 机器学习, 深度包检测技术, 基于行为模式的分类

Abstract: Internet traffic classification is a process of identifying network applications and classifying corresponding traffic, which is considered as the most basic function of modern network management and security system. And application-related traffic classification is the basic technology of recent network security. Traditional traffic classification methods include port-based prediction methods and payload-based depth detection methods. In current network environment, there are some practical problems in traditional methods, such as dynamic ports and encryption applications. Therefore, Machine Learning (ML) technology based on traffic statistics is used to classify and identify traffic. Machine learning can realize centralized automatic search by using provided traffic data and describe useful structural patterns, which is helpful to intelligently classify traffic. Initially, Naive Bayes method was used to identify and classify network traffic classification, performing well on specific flows with accuracy over 90%, while on traffic such as peer-to-peer transmission network traffic (P2P) with accuracy only about 50%. Then, methods such as Support Vector Machine (SVM) and Neural Network (NN) were used, and neural network method could make accuracy of overall network classification reach 80% or more. A number of studies show that the use of a variety of machine learning methods and their improvements can improve the accuracy of traffic classification.

Key words: traffic classification, background traffic, Machine Learning (ML), Deep Packet Inspection(DPI) technology, classification based on behavior patterns

中图分类号: