《计算机应用》唯一官方网站 ›› 2025, Vol. 45 ›› Issue (3): 872-882.DOI: 10.11772/j.issn.1001-9081.2024030325

• 网络空间安全 • 上一篇    下一篇

基于Attention-1DCNN-CE的加密流量分类方法

耿海军1,2(), 董赟1, 胡治国3,4, 池浩田1, 杨静1, 尹霞5   

  1. 1.山西大学 自动化与软件学院,太原 030031
    2.山西清众科技股份有限公司,太原 030006
    3.山西大学 计算机与信息技术学院,太原 030006
    4.嵌入式系统与服务计算教育部重点实验室(同济大学),上海 201804
    5.清华大学 计算机科学与技术系,北京 100084
  • 收稿日期:2024-03-25 修回日期:2024-05-27 接受日期:2024-05-28 发布日期:2024-07-22 出版日期:2025-03-10
  • 通讯作者: 耿海军
  • 作者简介:董赟(1997—),男,山西晋中人,硕士研究生,主要研究方向:网络安全
    胡治国(1977—),男,山西太原人,副教授,博士,CCF会员,主要研究方向:网络安全
    池浩田(1990—),男,山西太原人,讲师,博士,主要研究方向:物联网安全、无线安全、隐私保护
    杨静(1990—),女,山西太原人,讲师,博士,主要研究方向:模糊系统、图像重建
    尹霞(1972—),女,北京人,教授,博士,CCF会员,主要研究方向:下一代互联网体系结构、协议测试。
  • 基金资助:
    国家自然科学基金资助项目(62472267);山西省应用基础研究计划项目(20210302123444)

Encrypted traffic classification method based on Attention-1DCNN-CE

Haijun GENG1,2(), Yun DONG1, Zhiguo HU3,4, Haotian CHI1, Jing YANG1, Xia YIN5   

  1. 1.School of Automation and Software Engineering,Shanxi University,Taiyuan Shanxi 030031,China
    2.Shanxi Qingzhong Technology Company Limited,Taiyuan Shanxi 030006,China
    3.School of Computer and Information Technology,Shanxi University,Taiyuan Shanxi 030006,China
    4.Key Laboratory of Embedded System and Service Computing,Ministry of Education (Tongji University),Shanghai 201804,China
    5.Department of Computer Science and Technology,Tsinghua University,Beijing 100084,China
  • Received:2024-03-25 Revised:2024-05-27 Accepted:2024-05-28 Online:2024-07-22 Published:2025-03-10
  • Contact: Haijun GENG
  • About author:DONG Yun, born in 1997, M. S. candidate. His research interests include cybersecurity.
    HU Zhiguo, born in 1977, Ph. D., associate professor. His research interests include cybersecurity.
    CHI Haotian, born in 1990, Ph. D., lecturer. His research interests include internet of things security, wireless security, privacy protection.
    YANG Jing, born in 1990, Ph. D., lecturer. Her research interests include fuzzy system, image reconstruction.
    YIN Xia, born in 1972, Ph. D., professor. Her research interests include next generation Internet architecture, protocol test.
  • Supported by:
    National Natural Science Foundation of China(62472267);Fundamental Research Program of Shanxi Province(20210302123444)

摘要:

针对传统加密流量识别方法存在多分类准确率低、泛化性不强以及易侵犯隐私等问题,提出一种结合注意力机制(Attention)与一维卷积神经网络(1DCNN)的多分类深度学习模型——Attention-1DCNN-CE。该模型包含3个核心部分:1)数据集预处理阶段,保留原始数据流中数据包间的空间关系,并根据样本分布构建成本敏感矩阵;2)在初步提取加密流量特征的基础上,利用Attention和1DCNN模型深入挖掘并压缩流量的全局与局部特征;3)针对数据不平衡这一挑战,通过结合成本敏感矩阵与交叉熵(CE)损失函数,显著提升少数类别样本的分类精度,进而优化模型的整体性能。实验结果表明,在BOT-IOT和TON-IOT数据集上该模型的整体识别准确率高达97%以上;并且该模型在公共数据集ISCX-VPN和USTC-TFC上表现优异,在不需要预训练的前提下,达到了与ET-BERT(Encrypted Traffic BERT)相近的性能;相较于PERT(Payload Encoding Representation from Transformer),该模型在ISCX-VPN数据集的应用类型检测中的F1分数提升了29.9个百分点。以上验证了该模型的有效性,为加密流量识别和恶意流量检测提供了解决方案。

关键词: 网络安全, 加密流量, 注意力机制, 一维卷积神经网络, 数据不平衡, 成本敏感矩阵

Abstract:

To address the problems of low multi-classification accuracy, poor generalization, and easy privacy invasion in traditional encrypted traffic identification methods, a multi-classification deep learning model that combines Attention mechanism (Attention) with one-Dimensional Convolutional Neural Network (1DCNN) was proposed, namely Attention-1DCNN-CE. This model consists of three core components: 1) in the dataset preprocessing stage, the spatial relationship among packets in the original data stream was retained, and a cost-sensitive matrix was constructed on the basis of the sample distribution; 2) based on the preliminary extraction of encrypted traffic features, the Attention and 1DCNN models were used to mine deeply and compress the global and local features of the traffic; 3) in response to the challenge of data imbalance, by combining the cost-sensitive matrix with the Cross Entropy (CE) loss function, the sample classification accuracy of minority class was improved significantly, thereby optimizing the overall performance of the model. Experimental results show that on BOT-IOT and TON-IOT datasets, the overall identification accuracy of this model is higher than 97%. Additionally, on public datasets ISCX-VPN and USTC-TFC, this model performs excellently, and achieves performance similar to that of ET-BERT (Encrypted Traffic BERT) without the need for pre-training. Compared to Payload Encoding Representation from Transformer (PERT) on ISCX-VPN dataset, this model improves the F1 score in application type detection by 29.9 percentage points. The above validates the effectiveness of this model, so that this model provides a solution for encrypted traffic identification and malicious traffic detection.

Key words: cybersecurity, encrypted traffic, Attention mechanism (Attention), one-Dimensional Convolutional Neural Network (1DCNN), data imbalance, cost-sensitive matrix

中图分类号: