《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (4): 1151-1159.DOI: 10.11772/j.issn.1001-9081.2022030382

• 网络空间安全 • 上一篇    

面向加密流量分类的深度可解释方法

崔剑1,2, 麻开朗1,2, 孙钰1,2, 王豆3, 周君良4()   

  1. 1.北京航空航天大学 网络空间安全学院, 北京 100191
    2.空天网络安全工业和信息化部重点实验室(北京航空航天大学), 北京 100191
    3.浙江浙能数字科技有限公司, 杭州 311100
    4.浙江浙石油综合能源销售有限公司, 杭州 310000
  • 收稿日期:2022-04-02 修回日期:2022-09-13 接受日期:2022-09-14 发布日期:2023-01-11 出版日期:2023-04-10
  • 通讯作者: 周君良
  • 作者简介:崔剑(1980—),男,北京人,讲师,博士,主要研究方向:硬件系统安全、工业互联网安全;
    麻开朗(1999—),男,山西文水人,硕士研究生,主要研究方向:人工智能、网络安全;
    孙钰(1985—),男,山东烟台人,副教授,博士,主要研究方向:人工智能、网络安全;
    王豆(1993—),男,浙江绍兴人,硕士研究生,主要研究方向:工业信息化;
  • 基金资助:
    国家自然科学基金资助项目(32071775)

Deep explainable method for encrypted traffic classification

Jian CUI1,2, Kailang MA1,2, Yu SUN1,2, Dou WANG3, Junliang ZHOU4()   

  1. 1.School of Cyber Science and Technology,Beihang University,Beijing 100191,China
    2.Key Laboratory of Ministry of Industry and Information Technology for Cyberspace Security (Beihang University),Beijing 100191,China
    3.Zhejiang Zhenergy Digital Technology Company Limited,Hangzhou Zhejiang 311100,China
    4.Zhejiang Petroleum Integrated Energy Sales Company Limited,Hangzhou Zhejiang 310000,China
  • Received:2022-04-02 Revised:2022-09-13 Accepted:2022-09-14 Online:2023-01-11 Published:2023-04-10
  • Contact: Junliang ZHOU
  • About author:CUI Jian, born in 1980, Ph. D., lecturer. His research interests include hardware system security, industrial internet security.
    MA Kailang, born in 1999, M. S. candidate. His research interests include artificial intelligence, network security.
    SUN Yu, born in 1985, Ph. D., associate professor. His research interests include artificial intelligence, network security.
    WANG Dou, born in 1993, M. S. candidate. His research interests include industrial informatization.
  • Supported by:
    National Natural Science Foundation of China(32071775)

摘要:

目前的深度学习模型在加密流量分类任务上相较于传统机器学习方法的性能优势显著,然而由于它固有的黑盒特性,用户无法获知深度学习模型作出分类决策的机理。为了在保证分类准确率的同时提高深度学习模型的可信度,提出一种面向加密流量分类深度学习模型的可解释方法,包括基于原型的流量层级主动解释和基于特征相似显著图的数据包层级被动解释。首先,利用基于原型的流量原型网络(FlowProtoNet),在训练时自动提取各类流量的典型片段,即流量原型;其次,在测试时计算出待测流量与各类原型的相似度,从而在分类的同时实现训练集的溯源解释;然后,为进一步提升可视化解释能力,提出梯度加权的特征相似度显著图(Grad-SSM)方法。Grad-SSM首先通过梯度对特征图加权,过滤分类决策无关区域;接着,计算待测流量与FlowProtoNet提取的原型之间的推土机距离(EMD)得到相似矩阵,从而通过比较测试流量与该类原型,实现注意力热图的进一步聚焦。在ISCX VPN-nonVPN数据集上,所提方法的准确率达到96.86%,与不可解释的人工智能方法持平,而FlowProtoNet能通过给出与原型的相似度,进一步提供分类依据;同时,所提方法的可视化解释能力更强,注意力更聚焦于流量中的关键数据包。

关键词: 加密流量分类, 可解释人工智能, 原型, 溯源, 可视化解释能力

Abstract:

Current deep learning models have achieved significant performance advantages over traditional machine learning methods in encrypted traffic classification tasks. However, due to inherent black-box characteristic of the deep learning model, users cannot know the mechanism of classification decisions made by the model. In order to enhance the credibility of the deep learning model while ensuring the classification accuracy, an explainable method for deep learning model of encrypted traffic classification was proposed, including prototype-based traffic-level active explanation and feature similarity saliency map based packet-level passive explanation. Firstly, the prototype-based Flow Prototype Network (FlowProtoNet) was used to automatically extract typical segments of traffic during training, namely, traffic prototypes. Secondly, the similarity between the tested traffic and each prototype was calculated during testing to realize the classification and the traceability explanation of the training set. Thirdly, in order to further improve the visual explainability, Gradient Similarity Saliency Map (Grad-SSM) method was proposed, in which the irrelevant regions of classification decision were filtered out by weighting feature map with gradient, and then the Earth Mover’s Distance (EMD) between the tested traffic and the prototype extracted by FlowProtoNet was calculated to obtain the similarity matrix achieving further focus on attention heatmap by comparing the test traffic and this prototype. On ISCX VPN-nonVPN dataset, the accuracy of the proposed method reaches 96.86%, which is similar with that of the inexplainable method. And FlowProtoNet can further provide classification basis by giving the similarity with the prototype. At the same time, the proposed method has stronger visual explainability and focuses more on the key packets in the traffic.

Key words: encrypted traffic classification, explainable artificial intelligence, prototype, traceability, visual explainability

中图分类号: