基于自注意力机制和时空特征的Tor网站流量分析模型

doi:10.11772/j.issn.1001-9081.2021081452

《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (10): 3084-3090.DOI: 10.11772/j.issn.1001-9081.2021081452

• 网络空间安全 • 上一篇

基于自注意力机制和时空特征的Tor网站流量分析模型

席荣康, 蔡满春, 芦天亮, 李彦霖

中国人民公安大学信息网络安全学院，北京 100038

收稿日期:2021-08-17 修回日期:2021-12-03 接受日期:2021-12-06 发布日期:2022-01-07 出版日期:2022-10-10
通讯作者: 蔡满春
作者简介:第一联系人：席荣康（1997—），男，河南三门峡人，硕士研究生，主要研究方向：匿名通信
蔡满春（1972—），男，河北保定人，副教授，博士，主要研究方向：密码学、信保密； caimanchun@ppsuc.edu.cn
芦天亮（1985—），男，河北保定人，副教授，博士，主要研究方向：恶意代码检测、人工智能
李彦霖（1997—），男，广西玉林人，硕士研究生，主要研究方向：信息网络安全。
基金资助:
“十三五”国家密码发展基金密码理论研究重点课题(MMJJ20180108);中国人民公安大学2020年基本科研业务费重大项目(2020JKF101)

Tor website traffic analysis model based on self-attention mechanism and spatiotemporal features

Rongkang XI, Manchun CAI, Tianliang LU, Yanlin LI

School of Information Network Security，People’s Public Security University of China，Beijing 100038，China

Received:2021-08-17 Revised:2021-12-03 Accepted:2021-12-06 Online:2022-01-07 Published:2022-10-10
Contact: Manchun CAI
About author:XI Rongkang， born in 1997， M. S. candidate. His research interests include anonymous communication.
CAI Manchun，born in 1972， Ph. D. ， associate professor. His research interests include cryptography， communication security.
LU Tianliang，born in 1985， Ph. D. ， associate professor. His research interests include malicious code detection， artificial intelligence.
LI Yanlin，born in 1997， M. S. candidate. His research interests include information network security.
Supported by:
Key Research Project of Cryptology Theory of “the 13th Five Year Plan” National Cryptology Development Fund of China(MMJJ20180108);Major Project of Basic Scientific Research Expenses of People’s Public Security University of China in 2020(2020JKF101)

摘要/Abstract

摘要：

不法分子利用洋葱路由器（Tor）匿名通信系统从事暗网犯罪活动，为社会治安带来了严峻挑战。Tor网站流量分析技术通过捕获分析Tor匿名网络流量，及时发现隐匿在互联网上的违法行为进行网络监管。基于此，提出一种基于自注意力机制和时空特征的Tor网站流量分析模型——SA-HST。首先，引入注意力机制为网络流量特征分配不同的权重以突出重要特征；然后，利用并联结构多通道的卷积神经网络（CNN）和长短期记忆（LSTM）网络提取输入数据的时空特征；最后，利用Softmax函数对数据进行分类。SA-HST在封闭世界场景下能取得97.14%的准确率，与基于累积量模型CUMUL和深度学习模型CNN相比，分别提高了8.74个百分点和7.84个百分点；在开放世界场景下，SA-HST的混淆矩阵各项评价指标均稳定在96%以上。实验结果表明，自注意力机制能在轻量级模型结构下实现特征的高效提取，SA-HST通过捕获匿名流量的重要特征和多视野时空特征用于分类，在模型分类准确率、训练效率、鲁棒性等多方面性能均有一定优势。

关键词: 自注意力机制, 卷积神经网络, 长短期记忆网络, 洋葱路由器, 流量分析

Abstract:

The onion router （Tor） anonymous communication system is used by criminals to engage in criminal activities on the dark networks， which brings severe challenges to social security. Tor website traffic is captured and analyzed by Tor website traffic analysis technology and therefore illegal behaviors hidden on the internet are timely discovered to conduct network supervision. Based on this， a Tor website traffic analysis model based on Self-Attention and Hierarchical SpatioTemporal （SA-HST） features was proposed on the basis of self-attention mechanism and spatiotemporal features. Firstly， attention mechanism was introduced to assign different weights to the network traffic features to highlight the important features. Then， Convolutional Neural Network （CNN） with multi-channel parallel structure and Long Short-Term Memory （LSTM） network were used to extract the spatiotemporal features of input data. Finally， Softmax function was used to classify data. SA-HST can achieve 97.14% accuracy in closed world scenario， which is 8.74 percentage points and 7.84 percentage points higher compared to CUMUL（CUMULative sum fingerprinting） model and deep learning model CNN. In open world scenario， SA-HST has the evaluation indicators of confusion matrix above 96% stably. Experimental results show that self-attention mechanism can achieve efficient feature extraction under lightweight model structure. By capturing important， multi-view spatiotemporal features of anonymous traffic for classification， SA-HST has certain advantages in terms of classification accuracy， training efficiency and robustness.

Key words: self-attention mechanism, Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM) network, The onion router (Tor), traffic analysis

中图分类号:

TP309

席荣康, 蔡满春, 芦天亮, 李彦霖. 基于自注意力机制和时空特征的Tor网站流量分析模型[J]. 计算机应用, 2022, 42(10): 3084-3090.

Rongkang XI, Manchun CAI, Tianliang LU, Yanlin LI. Tor website traffic analysis model based on self-attention mechanism and spatiotemporal features[J]. Journal of Computer Applications, 2022, 42(10): 3084-3090.

图/表 12

图1 Tor网络结构

Fig. 1 Tor network structure

图2 SA-HST模型流程

Fig. 2 Flow of SA-HST model

图3 自注意力机制结构

Fig. 3 Self-attention mechanism structure

图4 多核CNN-LSTM层

Fig. 4 Multicore CNN-LSTM layer

图5 Tor网站指纹攻击流程

Fig. 5 Fingerprint attack flow on Tor website

表1 SA-HST模型参数

Tab. 1 SA-HST model parameters

层	参数
数据编码层	output_dim=128
自注意力层	output_dim=128
多核CNN层	卷积核尺寸分别为3、4、5，卷积核个数为32，激活函数ReLU
LSTM层	output_dim=128，激活函数为tanh
全连接层	Unit设置为2/100，激活函数为Softmax

表2 封闭世界下的模型分类准确率对比 (%)

Tab. 2 Comparison of model classification accuracy in closed world

模型	准确率	模型	准确率
CNN^［7］	89.30	SA-CNN	95.73
CUMUL^［5］	88.40	SA-HST	97.14
LSTM^［7］	89.10

图6 不同轮次下4类模型分类准确率对比

Fig. 6 Comparison of four categories model classification accuracy under different epochs

表3 4类模型每轮训练时间对比 (s)

Tab. 3 Comparison of training time per epoch among four models

模型	每轮训练时间	模型	每轮训练时间
CNN^［7］	149	SA-CNN	40
LSTM^［7］	160	SA-HST	155

图7 4类模型数据拟合性能对比

Fig. 7 Comparison of data fitting performance among four models

图8 4类模型在开放世界下的性能对比

Fig. 8 Comparison of performance in open world among four models

图9 不同网站比率下的4类模型分类准确率对比

Fig. 9 Comparison of classification accuracy under different website ratios among four models

参考文献 22

1	HINTZ A. Fingerprinting websites using traffic analysis［C］// Proceedings of the 2002 International Workshop on Privacy Enhancing Technologies， LNCS 2482. Cham： Springer， 2003： 171-178.
2	LIBERATORE M， LEVINE B N. Inferring the source of encrypted HTTP connections［C］// Proceedings of the 13th ACM Conference on Computer and Communications Security. New York： ACM， 2006：255-263. 10.1145/1180405.1180437
3	DINGLEDINE R， MATHEWSON N， SYVERSON P. Tor： the second-generation onion router［C］// Proceedings of the 13th USENIX Security Symposium. Berkeley： USENIX Association， 2004： 303-320. 10.21236/ada465464
4	HAYES J， DANEZIS G. k-fingerprinting： a robust scalable website fingerprinting technique［C］// Proceedings of the 25th USENIX Security Symposium. Berkeley： USENIX Association， 2016： 1187-1203.
5	PANCHENKO A， LANZE F， PENNEKAMP J， et al. Website fingerprinting at internet scale［C］// Proceedings of the 23rd Annual Network and Distributed System Security Symposium. Reston， VA： Internet Society， 2016： 1-15. 10.14722/ndss.2016.23477
6	ABE K， GOTO S. Fingerprinting attack on Tor anonymity using deep learning［EB/OL］. （2016-08-01）［2021-08-22］..
7	RIMMER V， PREUVENEERS D， JUAREZ M， et al. Automated website fingerprinting through deep learning［C］// Proceedings of the 25th Annual Network and Distributed System Security Symposium. Reston， VA： Internet Society， 2018： 1-15. 10.14722/ndss.2018.23105
8	ZHANG J W， LING Y， FU X B， et al. Model of the intrusion detection system based on the integration of spatial-temporal features［J］. Computers and Security， 2020， 89： No.101681. 10.1016/j.cose.2019.101681
9	马陈城，杜学绘，曹利峰，等. 基于深度神经网络burst特征分析的网站指纹攻击方法［J］. 计算机研究与发展， 2020， 57（4）：746-766. 10.7544/issn1000-1239.2020.20190860
	MA C C， DU X H， CAO L F， et al. burst-analysis website fingerprinting attack based on deep neural network［J］. Journal of Computer Research and Development， 2020， 57（4）： 746-766. 10.7544/issn1000-1239.2020.20190860
10	张道维，段海新. 基于图像纹理的网站指纹技术［J］. 计算机应用， 2020， 40（6）：1685-1691. 10.11772/j.issn.1001-9081.2019111981
	ZHANG D W， DUAN H X. Website fingerprinting technique based on image texture［J］. Journal of Computer Applications， 2020， 40（6）： 1685-1691. 10.11772/j.issn.1001-9081.2019111981
11	WANG T， GOLDBERG I. On realistically attacking Tor with website fingerprinting［J］. Proceedings on Privacy Enhancing Technologies， 2016， 2016（4）： 21-36. 10.1515/popets-2016-0027
12	SIRINAM P， MATHEWS N， MOHAMMAD M S， et al. Triplet fingerprinting： more practical and portable website fingerprinting with N-shot learning［C］// Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security. New York： ACM， 2019： 1131-1148. 10.1145/3319535.3354217
13	ATTARIAN R， ABDI L， HASHEMI S. AdaWFPA： adaptive online website fingerprinting attack for tor anonymous network： a stream-wise paradigm［J］. Computer Communications， 2019， 148： 74-85. 10.1016/j.comcom.2019.09.008
14	蔡满春，王腾飞，岳婷，等. 基于ARF的Tor网站指纹识别技术［J］. 信息网络安全， 2021， 21（4）：39-48. 10.3969/j.issn.1671-1122.2021.04.005
	CAI M C， WANG T F， YUE T， et al. ARF-based Tor website fingerprint recognition technology［J］. Netinfo Security， 2021， 21（4）： 39-48. 10.3969/j.issn.1671-1122.2021.04.005
15	高芬，苏依拉，牛向华，等. 基于Transformer的蒙汉神经机器翻译研究［J］. 计算机应用与软件， 2020， 37（2）：141-146， 225. 10.3969/j.issn.1000-386x.2020.02.022
	GAO F， SU Y L， NIU X H， et al. Mongolian-Chinese neural machine translation based on Transformer［J］. Computer Applications and Software， 2020， 37（2）： 141-146， 225. 10.3969/j.issn.1000-386x.2020.02.022
16	HOCHREITER S， SCHMIDHUBER J. Long short-term memory［J］. Neural Computation， 1997， 9（8）： 1735-1780. 10.1162/neco.1997.9.8.1735
17	VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2017： 6000-6010.
18	KATHAROPOULOS A， VYAS A， PAPPAS N， et al. Transformers are RNNs： fast autoregressive transformers with linear attention［C］// Proceedings of the 37th International Conference on Machine Learning. New York： JMLR.org， 2020： 5156-5165.
19	马明艳，陈伟，吴礼发. 基于CNN_BiLSTM的网络入侵检测方法［J］. 计算机工程与应用， 2022， 58（10）：116-124.
	MA M Y， CHEN W， WU L F. CNN_BiLSTM network based intrusion detection method［J］. Computer Engineering and Applications， 2022， 58（10）：116-124.
20	李辰，陈浩，李建勋. 多形态卷积并行神经网络建立效能评估指标体系［J］. 电光与控制， 2021， 28（11）：31-34， 93. 10.3969/j.issn.1671-637X.2021.11.007
	LI C， CHEN H， LI J X. An effectiveness evaluation indicator system based on multi-scale parallel convolution neural network［J］. Electronics Optics and Control， 2021， 28（11）：31-34， 93. 10.3969/j.issn.1671-637X.2021.11.007
21	肖珂，刘天一，孙晓燕，等. 基于增量式卷积神经网络的入侵检测方法［J］. 计算机应用， 2020， 40（S2）：73-79.
	XIAO K， LIU T Y， SUN X Y， et al. Intrusion detection method based on incremental convolution neural network［J］. Journal of Computer Applications， 2020， 40（S2）： 73-79.
22	赵恺，石立宝. 基于改进一维卷积神经网络的电力系统暂态稳定评估［J］. 电网技术， 2021， 45（8）：2945-2957.
	ZHAO K， SHI L B. Transient stability assessment of power system based on improved one-dimensional convolutional neural network［J］. Power System Technology， 2021， 45（8）： 2945-2957.

[1]	衡红军, 徐天宝. 基于多尺度卷积和门控机制的注意力情感分析模型[J]. 《计算机应用》唯一官方网站, 2022, 42(9): 2674-2679.
[2]	强赞霞, 鲍先富. 基于卷积长短期记忆的残差注意力去雨网络[J]. 《计算机应用》唯一官方网站, 2022, 42(9): 2858-2864.
[3]	刘月峰, 张小燕, 郭威, 边浩东, 何滢婕. 基于优化混合模型的航空发动机剩余寿命预测方法[J]. 《计算机应用》唯一官方网站, 2022, 42(9): 2960-2968.
[4]	侯旭东, 滕飞, 张艺. 基于深度自编码的医疗命名实体识别模型[J]. 《计算机应用》唯一官方网站, 2022, 42(9): 2686-2692.
[5]	刘汉卿, 康晓东, 张福青, 赵秀圆, 杨靖怡, 王笑天, 李梦凡. 改进的Libra区域卷积神经网络的脑动脉狭窄影像学检测算法[J]. 《计算机应用》唯一官方网站, 2022, 42(9): 2909-2916.
[6]	胡婕, 胡燕, 刘梦赤, 张龑. 基于知识库实体增强BERT模型的中文命名实体识别[J]. 《计算机应用》唯一官方网站, 2022, 42(9): 2680-2685.
[7]	王宇航, 周永霞, 吴良武. 基于高斯函数的池化算法[J]. 《计算机应用》唯一官方网站, 2022, 42(9): 2800-2806.
[8]	吕振虎, 许新征, 张芳艳. 基于挤压激励的轻量化注意力机制模块[J]. 《计算机应用》唯一官方网站, 2022, 42(8): 2353-2360.
[9]	靳华中, 张修洋, 叶志伟, 张闻其, 夏小鱼. 基于近似U型网络结构的图像去噪模型[J]. 《计算机应用》唯一官方网站, 2022, 42(8): 2571-2577.
[10]	张显杰, 张之明. 基于卷积神经网络和Transformer的手写体英文文本识别[J]. 《计算机应用》唯一官方网站, 2022, 42(8): 2394-2400.
[11]	程南江, 余贞侠, 陈琳, 乔贺辙. 基于领域自适应的多源多标签行人属性识别[J]. 《计算机应用》唯一官方网站, 2022, 42(8): 2401-2406.
[12]	徐成霞, 阎庆, 李腾, 苗开超. 基于联合注意力机制的单幅图像去雨算法[J]. 《计算机应用》唯一官方网站, 2022, 42(8): 2578-2585.
[13]	邓杰航, 郭文权, 陈汉杰, 顾国生, 刘景建, 杜宇坤, 刘超, 康晓东, 赵建. 融合多尺度多头自注意力和在线难例挖掘的小样本硅藻检测[J]. 《计算机应用》唯一官方网站, 2022, 42(8): 2593-2600.
[14]	左亚尧, 陈皓宇, 陈致然, 洪嘉伟, 陈坤. 融合多语义特征的命名实体识别方法[J]. 《计算机应用》唯一官方网站, 2022, 42(7): 2001-2008.
[15]	王震宇, 张雷, 高文彬, 权威铭. 基于渐进式神经网络架构搜索的人体运动识别[J]. 《计算机应用》唯一官方网站, 2022, 42(7): 2058-2064.

基于自注意力机制和时空特征的Tor网站流量分析模型

Tor website traffic analysis model based on self-attention mechanism and spatiotemporal features

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 12

参考文献 22

相关文章 15

编辑推荐

Metrics