计算机应用 ›› 2020, Vol. 40 ›› Issue (4): 1109-1114.DOI: 10.11772/j.issn.1001-9081.2019081380

• 网络与通信 • 上一篇    下一篇

面向移动应用识别的结构化特征提取方法

沈亮, 王鑫, 陈曙晖   

  1. 国防科技大学 计算机学院, 长沙 410073
  • 收稿日期:2019-08-08 修回日期:2019-10-10 出版日期:2020-04-10 发布日期:2019-11-18
  • 通讯作者: 陈曙晖
  • 作者简介:沈亮(1989-),男,安徽蚌埠人,硕士研究生,主要研究方向:移动应用流量分类与识别;王鑫(1991-),男,山东济南人,博士研究生,主要研究方向:网络空间安全;陈曙晖(1974-),男,湖南益阳人,研究员,博士,主要研究方向:网络空间安全、网络体系结构、高速互联网监测。
  • 基金资助:
    国家重点研发计划项目(2016QY11W2004)。

Structural signature extraction method for mobile application recognition

SHEN Liang, WANG Xin, CHEN Shuhui   

  1. College of Computer, National University of Defense Technology, Changsha Hunan 410073, China
  • Received:2019-08-08 Revised:2019-10-10 Online:2020-04-10 Published:2019-11-18
  • Supported by:
    This work is partially supported by the National Key Research and Development Program of China(2016QY11W2004).

摘要: 针对移动应用流量监控及行为分析等需要,为有效识别移动网络流量所属的应用,提出一种超文本传输协议(HTTP)流结构化特征提取方法。采取一款自研的基于虚拟专用网络(VPN)的流量采集工具获取研究数据,该工具能够精确标识每一条数据流归属的应用。在特征提取阶段,不预先设计特征构成,通过流聚类、获取最长公共子序列、字符替换得到应用HTTP流的结构化特征。从42种应用的117 772条HTTP流中提取特征,并对测试集的50 387条HTTP流进行识别,所提方法的平均准确率达99%,平均查全率为90.63%,单个应用最大误报率为0.52%。实验结果表明,该结构化特征提取方法能够有效识别移动应用的流量。

关键词: 流量采集, 移动应用识别, 流量分类, 深度包检测, 特征提取

Abstract: Focusing on the needs of mobile application traffic monitoring and behavior analysis,a Hyper Text Transfer Protocol(HTTP)traffic structural signature extraction method was proposed to effectively identify the application to which mobile network traffic belongs. A self-developed Virtual Private Network(VPN)-based traffic collection tool was used to obtain the research data,which was able to accurately identify the application that each data stream belongs to. In the signature extraction stage,the signature composition was not pre-designed,and the structural signatures of the HTTP traffic were obtained through three steps of flow clustering,obtaining the longest common subsequence and character substitution. The signatures of 42 applications were extracted from 117 772 HTTP traffic to identify 50 387 HTTP traffic in test set. The proposed algorithm has the average accuracy of 99%,the average recall of 90. 63%,and the maximum false positive rate of single application of 0. 52%. The experimental results show that the proposed structural signature extraction method can effectively identify the traffic of mobile applications.

Key words: traffic collection, mobile application identification, traffic classification, Deep Packet Inspection (DPI), signature extraction

中图分类号: