《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (10): 3084-3090.DOI: 10.11772/j.issn.1001-9081.2021081452

• 网络空间安全 • 上一篇    

基于自注意力机制和时空特征的Tor网站流量分析模型

席荣康, 蔡满春, 芦天亮, 李彦霖   

  1. 中国人民公安大学 信息网络安全学院,北京 100038
  • 收稿日期:2021-08-17 修回日期:2021-12-03 接受日期:2021-12-06 发布日期:2022-01-07 出版日期:2022-10-10
  • 通讯作者: 蔡满春
  • 作者简介:第一联系人:席荣康(1997—),男,河南三门峡人,硕士研究生,主要研究方向:匿名通信
    蔡满春(1972—),男,河北保定人,副教授,博士,主要研究方向:密码学、信保密; caimanchun@ppsuc.edu.cn
    芦天亮(1985—),男,河北保定人,副教授,博士,主要研究方向:恶意代码检测、人工智能
    李彦霖(1997—),男,广西玉林人,硕士研究生,主要研究方向:信息网络安全。
  • 基金资助:
    “十三五”国家密码发展基金密码理论研究重点课题(MMJJ20180108);中国人民公安大学2020年基本科研业务费重大项目(2020JKF101)

Tor website traffic analysis model based on self-attention mechanism and spatiotemporal features

Rongkang XI, Manchun CAI, Tianliang LU, Yanlin LI   

  1. School of Information Network Security,People’s Public Security University of China,Beijing 100038,China
  • Received:2021-08-17 Revised:2021-12-03 Accepted:2021-12-06 Online:2022-01-07 Published:2022-10-10
  • Contact: Manchun CAI
  • About author:XI Rongkang, born in 1997, M. S. candidate. His research interests include anonymous communication.
    CAI Manchun,born in 1972, Ph. D. , associate professor. His research interests include cryptography, communication security.
    LU Tianliang,born in 1985, Ph. D. , associate professor. His research interests include malicious code detection, artificial intelligence.
    LI Yanlin,born in 1997, M. S. candidate. His research interests include information network security.
  • Supported by:
    Key Research Project of Cryptology Theory of “the 13th Five Year Plan” National Cryptology Development Fund of China(MMJJ20180108);Major Project of Basic Scientific Research Expenses of People’s Public Security University of China in 2020(2020JKF101)

摘要:

不法分子利用洋葱路由器(Tor)匿名通信系统从事暗网犯罪活动,为社会治安带来了严峻挑战。Tor网站流量分析技术通过捕获分析Tor匿名网络流量,及时发现隐匿在互联网上的违法行为进行网络监管。基于此,提出一种基于自注意力机制和时空特征的Tor网站流量分析模型——SA-HST。首先,引入注意力机制为网络流量特征分配不同的权重以突出重要特征;然后,利用并联结构多通道的卷积神经网络(CNN)和长短期记忆(LSTM)网络提取输入数据的时空特征;最后,利用Softmax函数对数据进行分类。SA-HST在封闭世界场景下能取得97.14%的准确率,与基于累积量模型CUMUL和深度学习模型CNN相比,分别提高了8.74个百分点和7.84个百分点;在开放世界场景下,SA-HST的混淆矩阵各项评价指标均稳定在96%以上。实验结果表明,自注意力机制能在轻量级模型结构下实现特征的高效提取,SA-HST通过捕获匿名流量的重要特征和多视野时空特征用于分类,在模型分类准确率、训练效率、鲁棒性等多方面性能均有一定优势。

关键词: 自注意力机制, 卷积神经网络, 长短期记忆网络, 洋葱路由器, 流量分析

Abstract:

The onion router (Tor) anonymous communication system is used by criminals to engage in criminal activities on the dark networks, which brings severe challenges to social security. Tor website traffic is captured and analyzed by Tor website traffic analysis technology and therefore illegal behaviors hidden on the internet are timely discovered to conduct network supervision. Based on this, a Tor website traffic analysis model based on Self-Attention and Hierarchical SpatioTemporal (SA-HST) features was proposed on the basis of self-attention mechanism and spatiotemporal features. Firstly, attention mechanism was introduced to assign different weights to the network traffic features to highlight the important features. Then, Convolutional Neural Network (CNN) with multi-channel parallel structure and Long Short-Term Memory (LSTM) network were used to extract the spatiotemporal features of input data. Finally, Softmax function was used to classify data. SA-HST can achieve 97.14% accuracy in closed world scenario, which is 8.74 percentage points and 7.84 percentage points higher compared to CUMUL(CUMULative sum fingerprinting) model and deep learning model CNN. In open world scenario, SA-HST has the evaluation indicators of confusion matrix above 96% stably. Experimental results show that self-attention mechanism can achieve efficient feature extraction under lightweight model structure. By capturing important, multi-view spatiotemporal features of anonymous traffic for classification, SA-HST has certain advantages in terms of classification accuracy, training efficiency and robustness.

Key words: self-attention mechanism, Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM) network, The onion router (Tor), traffic analysis

中图分类号: