Journal of Computer Applications ›› 2021, Vol. 41 ›› Issue (5): 1386-1391.DOI: 10.11772/j.issn.1001-9081.2020071073

Special Issue: 网络空间安全

• Cyber security • Previous Articles     Next Articles

Encrypted traffic classification method based on data stream

GUO Shuai1,2, SU Yang1,2   

  1. 1. College of Cryptographic Engineering, Engineering University of PAP, Xi'an Shaanxi 710086, China;
    2. Key Laboratory of Network and Information Security under the Armed Police Force(Engineering University of PAP), Xi'an Shaanxi 710086, China
  • Received:2020-07-23 Revised:2020-10-15 Online:2021-05-10 Published:2020-11-12


郭帅1,2, 苏旸1,2   

  1. 1. 武警工程大学 密码工程学院, 西安 710086;
    2. 网络与信息安全武警部队重点实验室(武警工程大学), 西安 710086
  • 通讯作者: 苏旸
  • 作者简介:郭帅(1992-),男,河南濮阳人,硕士研究生,主要研究方向:加密流量识别、深度学习;苏旸(1975-),男,陕西西安人,教授,博士,主要研究方向:网络安全、信息对抗。

Abstract: Aiming at the problems of fast classification and accurate identification of encrypted traffic in current network, a new feature extraction method for data stream was proposed. Based on the characteristics of sequential data and the law of the SSL (Secure Sockets Layer) handshake protocol, an end-to-end one-dimensional convolutional neural network model was adopted, and five-tuples were used to label the data stream. By selecting the data stream representation manner, the number of data packets, and the length of feature bytes, the key field positions of sample classification were located more accurately, and the features with little impact on sample classification were removed, so that the 784 bytes used by a single data stream during the original input were reduced to 529 bytes, which reduced 32% of the original length, and the classification of 12 encrypted traffic service types was implemented with the accuracy of 95.5%. These results show that the proposed method can reduce the original input feature dimension and improve the efficiency of data processing on the basis of ensuring the accuracy of the current research.

Key words: encrypted traffic classification, end-to-end, Convolutional Neural Network (CNN), data stream, five-tuple, Secure Sockets Layer (SSL) protocol

摘要: 针对当前网络中加密流量的快速分类和准确识别的问题,提出了一种新的数据流特征提取方法。依据序列型数据特点和SSL握手协议规律,采用了端到端的一维卷积神经网络模型,并利用五元组来标记数据流;通过对数据流表示方式、数据包个数和特征字节长度的选择,更准确地定位了样本分类的关键字段位置,去除了对样本分类影响较小的特征,从而把原始输入时单个数据流使用的784字节缩减到529字节,精简了原长度的32%,并且实现了加密流量服务类型的12分类,其准确率达到95.5%。这些结果表明,所提方法可以在保证当前研究准确率的基础上减少原始输入特征维度并提高数据处理的效率。

关键词: 加密流量分类, 端到端, 卷积神经网络, 数据流, 五元组, SSL协议

CLC Number: