网络背景流量的分类与识别研究综述

doi:10.11772/j.issn.1001-9081.2018071552

摘要/Abstract

摘要： 互联网流量分类是识别网络应用和分类相应流量的过程，这被认为是现代网络管理和安全系统中最基本的功能。与应用相关的流量分类是网络安全的基础技术。传统的流量分类方法包括基于端口的预测方法和基于有效载荷的深度检测方法。在目前的网络环境下，传统的方法存在一些实际问题，如动态端口和加密应用，因此采用基于流量统计特征的机器学习（ML）技术来进行流量分类识别。机器学习可以利用提供的流量数据进行集中自动搜索，并描述有用的结构模式，这有助于智能地进行流量分类。起初使用朴素贝叶斯方法进行网络流量分类的识别和分类，对特定流量进行实验时，表现较好，准确度可达90%以上，但对点对点传输网络流量（P2P）等流量识别准确度仅能达到50%左右。然后有使用支持向量机（SVM）和神经网络（NN）等方法，神经网络方法使整体网络流量的分类准确度能达到80%以上。多项研究结果表明，对于多种机器学习方法的使用和后续的改进，很好地提高了流量分类的准确性。

关键词: 流量分类, 背景流量, 机器学习, 深度包检测技术, 基于行为模式的分类

Abstract: Internet traffic classification is a process of identifying network applications and classifying corresponding traffic, which is considered as the most basic function of modern network management and security system. And application-related traffic classification is the basic technology of recent network security. Traditional traffic classification methods include port-based prediction methods and payload-based depth detection methods. In current network environment, there are some practical problems in traditional methods, such as dynamic ports and encryption applications. Therefore, Machine Learning (ML) technology based on traffic statistics is used to classify and identify traffic. Machine learning can realize centralized automatic search by using provided traffic data and describe useful structural patterns, which is helpful to intelligently classify traffic. Initially, Naive Bayes method was used to identify and classify network traffic classification, performing well on specific flows with accuracy over 90%, while on traffic such as peer-to-peer transmission network traffic (P2P) with accuracy only about 50%. Then, methods such as Support Vector Machine (SVM) and Neural Network (NN) were used, and neural network method could make accuracy of overall network classification reach 80% or more. A number of studies show that the use of a variety of machine learning methods and their improvements can improve the accuracy of traffic classification.

Key words: traffic classification, background traffic, Machine Learning (ML), Deep Packet Inspection(DPI) technology, classification based on behavior patterns

中图分类号:

TP393.02

邹腾宽, 汪钰颖, 吴承荣. 网络背景流量的分类与识别研究综述[J]. 计算机应用, 2019, 39(3): 802-811.

ZOU Tengkuan, WANG Yuying, WU Chengrong. Review of network background traffic classification and identification[J]. Journal of Computer Applications, 2019, 39(3): 802-811.

参考文献

[1] HUANG J, QIAN F, MAO Z M, et al. Screen-off traffic characterization and optimization in 3G/4G networks[C]//IMC'12:Proceedings of the 2012 International Conference on Internet Measurement Conference. New York:ACM, 2012:357-364.
[2] DAINOTTI A, PESCAPE A, CLAFFY K C. Issues and future directions in traffic classification[J]. IEEE Network, 2012, 26(1):35-40.
[3] KARAGIANNIS T, PAPAGIANNAKI K, FALOUTSOS M. BLINC:multilevel traffic classification in the dark[J]. ACM SIGCOMM Computer Communication Review, 2005, 35(4):229-240.
[4] MOORE A W, PAPAGIANNAKI K. Toward the accurate identification of network applications[C]//PAM 2005:Proceedings of the 2005 International Workshop on Passive and Active Network Measurement, LNCS 3431. Berlin:Springer, 2005:41-54.
[5] DEWES C, WICHMANN A, FELDMANN A. An analysis of Internet chat systems[C]//Proceedings of the 2003 SIGCOMM Conference on Internet Measurement. New York:ACM, 2003:51-64.
[6] SEN S, SPATSCHECK O, WANG D. Accurate, scalable in-network identification of P2P traffic using application signatures[C]//Proceedings of the 2004 International Conference on World Wide Web. New York:ACM, 2004:512-521.
[7] 康宁.HTTPS网页流量的指纹提取和识别技术研究[D]. 哈尔滨:哈尔滨工业大学,2017:37-39.(KANG N. Research on fingerprint extraction and recognition technology of HTTPS Web traffic[D]. Harbin:Harbin Institute of Technology, 2017:37-39.)
[8] 刘佳雄.基于DPI和DFI技术的对等流量识别系统的设计[D].秦皇岛:燕山大学,2010:20-30.(LIU J X. Design of peer-to-peer traffic identification system based on DPI and DFI technology[D]. Qinhuangdao:Yanshan University, 2010:20-30.)
[9] 胡庆安.基于双重特征的协议识别方法研究[D].成都:西南交通大学,2010:23-40.(HU Q A. Research on protocol identification method based on dual features[D]. Chengdu:Southwest Jiaotong University, 2010:23-40.)
[10] 陈传通.基于正则表达式匹配的网络流量识别系统的研究与实现[D]. 济南:山东大学,2013:17-22.(CHEN C T. Research and implementation of network traffic identification system based on regular expression matching[D]. Jinan:Shandong University, 2013:17-22.)
[11] 刘泷.基于DPI的网络业务流量识别技术研究[D].济宁:曲阜师范大学,2017:15-31.(LIU L. Research on network service traffic identification technology based on DPI[D]. Jining:Qufu Normal University, 2017:15-31.)
[12] MINH Q T, KOTO H, KITAHARA T, et al. Separation of background and foreground traffic based on periodicity analysis[C]//Proceedings of the 2015 IEEE Global Communications Conference. Piscataway, NJ:IEEE, 2015:1-7.
[13] MINH Q T. An effective approach to background traffic detection[C]//FDSE 2015:Proceedings of the 2015 International Conference on Future Data and Security Engineering, LNCS 9446. Berlin:Springer, 2015:135-146.
[14] MEKKY H, MOHAISEN A, ZHANG Z L. Blind separation of benign and malicious events to enable accurate malware family classification[C]//Proceedings of the 2014 SIGSAC Conference on Computer and Communications Security. New York:ACM, 2014:1478-1480.
[15] MOORE A W, ZUEV D. Internet traffic classification using Bayesian analysis techniques[J]. ACM SIGMETRICS Performance Evaluation Review, 2005, 33(1):50-60.
[16] ESTE A, GRINGOLI F, SALGARELLI L. Support vector machines for TCP traffic classification[J]. Computer Networks, 2009, 53(14):2476-2490.
[17] WILLIAMS N, ZANDER S, ARMITAGE G. A preliminary performance comparison of five machine learning algorithms for practical IP traffic flow classification[J]. ACM SIGCOMM Computer Communication Review, 2006, 36(5):5-16.
[18] ESTE A, GRINGOLI F, SALGARELLI L. On-line SVM traffic classification[C]//Proceedings of the 20117th International Wireless Communications and Mobile Computing Conference. Piscataway, NJ:IEEE, 2011:1778-1783.
[19] GROLÉAT T, ARZEL M, VATON S. Hardware acceleration of SVM-based traffic classification on FPGA[C]//Proceedings of the 20128th International Wireless Communications and Mobile Computing Conference. Piscataway, NJ:IEEE, 2012:443-449.
[20] GROLÉAT T, ARZEL M, VATON S. Stretching the edges of SVM traffic classification with FPGA acceleration[J]. IEEE Transactions on Network and Service Management, 2014, 11(3):278-291.
[21] KONG L, HUANG G, WU K. Identification of abnormal network traffic using support vector machine[C]//Proceedings of the 201718th International Conference on Parallel and Distributed Computing, Applications and Technologies. Piscataway, NJ:IEEE, 2017:288-292.
[22] HE H. A network traffic classification method using support vector machine with feature weighted-degree[J]. Journal of Digital Information Management, 2017, 15(2):76-83.
[23] RAAHEMI B, HAYAJNEH A, RABINOVITCH P. Classification of peer-to-peer traffic using neural networks[C]//Proceedings of the 2007 International Conference on Artificial Intelligence and Pattern Recognition. Piscataway, NJ:IEEE, 2007:411-417.
[24] RAAHEMI B, HAYAJNEH A, RABINOVITCH P. Peer-to-peer IP traffic classification using decision tree and IP layer attributes[J]. International Journal of Business Data Communications and Networking, 2007, 3(4):60.
[25] RAAHEMI B, KOUZNETSOV A, HAYAJNEH A, et al. Classification of peer-to-peer traffic using incremental neural networks (fuzzy ARTMAP)[C]//Proceedings of the 2008 Canadian Conference on Electrical and Computer Engineering. Piscataway, NJ:IEEE, 2008:719-724.
[26] SHEN F, PAN C, REN X. Research of P2P traffic identification based on BP neural network[C]//ⅡH-MSP 2007:Proceedings of the 2007 International Conference on Intelligent Information Hiding and Multimedia Signal Processing. Washington, DC:IEEE Computer Society, 2007, 2:75-78.
[27] GU C, ZHUANG S. A novel P2P traffic classification approach using back propagation neural network[C]//Proceedings of the 2010 IEEE 12th International Conference on Communication Technology. Piscataway, NJ:IEEE, 2010:52-55.
[28] CHEN H, HU Z, YE Z, et al. Research of P2P traffic identification based on neural network[C]//CNMT 2009:Proceedings of the 2009 International Symposium on Computer Network and Multimedia Technology. Piscataway, NJ:IEEE, 2009:1-4.
[29] SUN R, YANG B, PENG L, et al. Traffic classification using probabilistic neural networks[C]//Proceedings of the 20106th International Conference on Natural Computation. Piscataway, NJ:IEEE, 2010, 4:1914-1919.
[30] 贺静,赵峦.基于PCA-概率神经网络的P2P流量分类方法研究[J].电脑开发与应用,2011,24(7):18-20.(HE J, ZHAO L. Research on P2P traffic classification based on PCA-probabilistic neural network[J]. Computer Development and Applications, 2011, 24(7):18-20.)
[31] AKILANDESWARI V, SHALINIE S M. Probabilistic neural network based attack traffic classification[C]//Proceedings of the 20124th International Conference on Advanced Computing. Piscataway, NJ:IEEE, 2012:1-8.
[32] SINGH K, AGRAWAL S. Internet traffic classification using RBF neural network[C]//Proceedings of the 2011 International Conference on Communication and Computing technologies. Jalandhar, India:[s.n.], 2011:39-43.
[33] MATHEWOS B, CARVALHO M, HAM F. Network traffic classification using a parallel neural network classifier architecture[C]//CSⅡRW'11:Proceedings of the 7th Annual Workshop on Cyber Security and Information Intelligence Research. New York:ACM, 2011:Article No. 33.
[34] WANG W, ZHU M, ZENG X, et al. Malware traffic classification using convolutional neural network for representation learning[C]//Proceedings of the 2017 International Conference on Information Networking. Piscataway, NJ:IEEE, 2017:712-717.
[35] 徐鹏,林森.基于C4.5决策树的流量分类方法[J].软件学报,2009,20(10):2692-2704.(XU P, LIN S. Internet traffic classification using C4. 5 decision tree[J]. Journal of Software, 2009,20(10):2692-2704.)
[36] 陈云菁,张赟,陈经涛.基于决策树模型的P2P流量分类方法[J].计算机应用研究,2009,26(12):4690-4693.(CHEN Y J, ZHANG Y, CHEN J T. Method for P2P traffic classification based on decision-tree model[J]. Application Research of Computers, 2009, 26(12):4690-4693.).
[37] ZHANG Y, WANG H, CHENG S. A method for real-time peer-to-peer traffic classification based on C4.5[C]//Proceedings of the 2010 IEEE 12th International Conference on Communication Technology. Piscataway, NJ:IEEE, 2010:1192-1195.
[38] HE H, GARCIA E A. Learning from imbalanced data[J]. IEEE Transactions on Knowledge and Data Engineering, 2009, 21(9):1263-1284.
[39] BARANDELA R, SÁNCHEZ J S, GARCIA V, et al. Strategies for learning in class imbalance problems[J]. Pattern Recognition, 2003, 36(3):849-851.
[40] WU D, CHEN X, CHEN C, et al. On addressing the imbalance problem:a correlated KNN approach for network traffic classification[C]//Proceedings of the 2015 International Conference on Network and System Security, LNCS 8792. Berlin:Springer, 2015:138-151.
[41] DU M, CHEN X S, TAN J. A new P2P traffic identification algorithm based on BPSO and KNN[J]. China Communications, 2011, 8(2):52-58.
[42] McGAUGHEY D, SEMENIUK T, SMITH R, et al. A systematic approach of feature selection for encrypted network traffic classification[C]//Proceedings of the 2018 Annual IEEE International Systems Conference. Piscataway, NJ:IEEE, 2018:1-8.
[43] BERNAILLE L, TEIXEIRA R, AKODKENOU I, et al. Traffic classification on the fly[J]. ACM SIGCOMM Computer Communication Review, 2006, 36(2):23-26.
[44] ERMAN J, ARLITT M, MAHANTI A. Traffic classification using clustering algorithms[C]//Proceedings of the 2006 International Conference on SIGCOMM Workshop on Mining Network Data. New York:ACM, 2006:281-286.
[45] ERMAN J, MAHANTI A, ARLITT M. QRP05-4:Internet traffic identification using machine learning[C]//GLOBECOM'06:Proceedings of the 49th IEEE Conference on Global Telecommunications. Piscataway, NJ:IEEE, 2006:1-6.
[46] ERMAN J, MAHANTI A, ARLITT M, et al. Offline/realtime traffic classification using semi-supervised learning[J]. Performance Evaluation, 2007, 64(9/10/11/12):1194-1213.
[47] HOCHST J, BAUMGARTNER L, HOLLICK M, et al. Unsupervised traffic flow classification using a neural autoencoder[C]//Proceedings of the 2017 IEEE 42nd Conference on Local Computer Networks. Washington, DC:IEEE Computer Society, 2017:523-526.