Encrypted traffic classification method based on data stream

doi:10.11772/j.issn.1001-9081.2020071073

Abstract

Abstract: Aiming at the problems of fast classification and accurate identification of encrypted traffic in current network, a new feature extraction method for data stream was proposed. Based on the characteristics of sequential data and the law of the SSL (Secure Sockets Layer) handshake protocol, an end-to-end one-dimensional convolutional neural network model was adopted, and five-tuples were used to label the data stream. By selecting the data stream representation manner, the number of data packets, and the length of feature bytes, the key field positions of sample classification were located more accurately, and the features with little impact on sample classification were removed, so that the 784 bytes used by a single data stream during the original input were reduced to 529 bytes, which reduced 32% of the original length, and the classification of 12 encrypted traffic service types was implemented with the accuracy of 95.5%. These results show that the proposed method can reduce the original input feature dimension and improve the efficiency of data processing on the basis of ensuring the accuracy of the current research.

Key words: encrypted traffic classification, end-to-end, Convolutional Neural Network (CNN), data stream, five-tuple, Secure Sockets Layer (SSL) protocol

摘要： 针对当前网络中加密流量的快速分类和准确识别的问题，提出了一种新的数据流特征提取方法。依据序列型数据特点和SSL握手协议规律，采用了端到端的一维卷积神经网络模型，并利用五元组来标记数据流；通过对数据流表示方式、数据包个数和特征字节长度的选择，更准确地定位了样本分类的关键字段位置，去除了对样本分类影响较小的特征，从而把原始输入时单个数据流使用的784字节缩减到529字节，精简了原长度的32%，并且实现了加密流量服务类型的12分类，其准确率达到95.5%。这些结果表明，所提方法可以在保证当前研究准确率的基础上减少原始输入特征维度并提高数据处理的效率。

关键词: 加密流量分类, 端到端, 卷积神经网络, 数据流, 五元组, SSL协议

CLC Number:

TP393.08

GUO Shuai, SU Yang. Encrypted traffic classification method based on data stream[J]. Journal of Computer Applications, 2021, 41(5): 1386-1391.

郭帅, 苏旸. 基于数据流的加密流量分类方法[J]. 计算机应用, 2021, 41(5): 1386-1391.

References

[1] ANDERSON B,MCGREW D. Machine learning for encrypted malware traffic classification:accounting for noisy labels and nonstationarity[C]//Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York:ACM,2017:1723-1732.
[2] 陈雪娇, 王攀, 俞家辉. 基于卷积神经网络的加密流量识别方法[J]. 南京邮电大学学报(自然科学版),2018,38(6):36-41. (CHEN X J,WANG P,YU J H. CNN based encrypted traffic identification method[J]. Journal of Nanjing University of Posts and Telecommunications (Natural Science Edition),2018,38(6):36-41.)
[3] WANG W,ZHU M,WANG J,et al. End-to-end encrypted traffic classification with one-dimensional convolution neural networks[C]//Proceedings of the 2017 IEEE International Conference on Intelligence and Security Informatics. Piscataway:IEEE,2017:43-48.
[4] WANG Z. The applications of deep learning on traffic identification[EB/OL].[2020-07-03]. https://www.blackhat.com/docs/us-15/materials/us-15-Wang-The-Applications-Of-Deep-Learning-On-TrafficIdentification-wp.pdf.
[5] REZAEI S, LIU X. Deep learning for encrypted traffic classification:an overview[J]. IEEE Communications Magazine, 2019,57(5):76-81.
[6] REZAEI S, KROENCKE B, LIU X. Large-scale mobile app identification using deep learning[J]. IEEE Access,2019,8:348-362.
[7] ANDERSON B, MCGREW D. Identifying encrypted malware traffic with contextual flow data[C]//Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security. New York:ACM, 2016:35-46.
[8] LOTFOLLAHI M,JAFARI SIAVOSHANI M,SHIRALI HOSSEIN ZADE R,et al. Deep packet:a novel approach for encrypted traffic classification using deep learning[J]. Soft Computing,2020,24(3):1999-2012.
[9] 卓勤政. 基于深度学习的网络流量分析研究[D]. 南京:南京理工大学,2018:31-45.(ZHUO Q Z. Research on network traffic analysis based on deep learning[D]. Nanjing:Nanjing University of Science and Technology,2018:31-45.)
[10] 马若龙. 基于卷积神经网络的未知和加密流量识别的研究与实现[D]. 北京:北京邮电大学,2018:39-45.(MA R L. Research and implementation of unknown and encrypted traffic identification based on convolutional neural network[D]. Beijing:Beijing University of Posts and Telecommunications, 2018:39-45.)
[11] KUMANO Y,ATA S,NAKAMURA N,et al. Towards real-time processing for application identification of encrypted traffic[C]//Proceedings of the 2014 International Conference on Computing, Networking and Communications. Piscataway:IEEE, 2014:136-140.
[12] 陈良臣, 高曙, 刘宝旭, 等. 网络加密流量识别研究进展及发展趋势[J]. 信息网络安全,2019,19(3):19-25.(CHEN L C, GAO S,LIU B X,et al. Research status and development trends on network encrypted traffic identi fi cation[J]. Netinfo Security, 2019,19(3):19-25.)
[13] 潘吴斌, 程光, 郭晓军, 等. 网络加密流量识别研究综述及展望[J]. 通信学报,2016,37(9):154-167.(PAN W B,CHENG G, GUO X J,et al. Review and perspective on encrypted traffic identification research[J]. Journal on Communications,2016,37(9):154-167.)
[14] DORFINGER P,PANHOLZER G,JOHN W. Entropy estimation for real-time encrypted traffic identification (short paper)[C]//Proceedings of the 2011 International Workshop on Traffic Monitoring and Analysis,LNCS 6613. Berlin:Springer,2011:164-171.
[15] 傅建明, 黎琳, 郑锐, 等. 基于GAN的网络攻击检测研究综述[J]. 信息网络安全,2019,19(2):1-9.(FU J M,LI L,ZHENG R,et al. Survey of network attack detection based on GAN[J]. Netinfo Security,2019,19(2):1-9.)
[16] 杨婧. SSH协议的研究与应用[J]. 计算机与数字工程,2011, 39(8):112-114.(YANG J. Study and application on secure shell protocol[J]. Computer and Digital Engineering,2011,39(8):112-114.)