• •    

基于离散序列报文的协议格式特征提取算法

李阳1,李青1,张霞2   

  1. 1. 解放军信息工程大学
    2. 解放军信息工程学院
  • 收稿日期:2016-09-28 修回日期:2016-11-11 发布日期:2016-11-11
  • 通讯作者: 李阳

Format signature construction method based on separate protocol message

  • Received:2016-09-28 Revised:2016-11-11 Online:2016-11-11

摘要: 针对缺少会话信息的离散序列报文,提出一种基于离散序列报文的协议格式特征自动提取算法(SPMbFSC)。 SPMbFSC在对离散序列报文进行聚类的基础上,通过改进的频繁模式挖掘算法提取出协议关键字,进一步对协议关键字进行选择,筛选出协议格式特征。仿真结果表明,SPMbFSC在以单个报文为颗粒度的识别中对六种协议的识别率均能达到95%以上,在以会话为颗粒度的识别中识别率可达90%。同等实验条件下性能优于自适应特征提取方法(AdapSig)。实验结果表明SPMbFSC不依赖会话数据的完整性,更符合实际应用中由于接收条件限制导致会话信息不完整的情形。

关键词: 离散序列报文, 协议关键字提取, 自适应特征挖掘, 格式特征, 协议识别

Abstract: To solve the problem of separate messages received from protocol rather than flows, a novel Separate Protocol Message based Format Signature Construction (SPMbFSC) algorithm was proposed. SPMbFSC extracted the protocol format signature automatically on the basis of the protocol’s separate messages instead of flows. First, SPMbFSC put the protocol’s separate messages into clusters based on the Euclidean distances among them. Then within each of the message clusters, SPMbFSC extracted the key words out of them. At last, SPMbFSC acquired the format signature by filtering and choosing the key words. Simulation results show that SPMbFSC is quite accurate and reliable. The accuracy for each of six protocols achieves above 95% when SPMbFSC is used in protocol’s separate message classification. In comparison with Adaptive Application Signature Extraction Method (AdapSig), SPMbFSC achieves higher accuracy in protocol’s flow classification and the accuracy for each protocol is above 90%. Experimental results indicate that the proposed SPMbFSC doesn’t depend on flow and SPMbFSC is more practical in situations where separate protocol messages are received.

Key words: separate protocol message, protocol keyword extraction, automatic format signature mining, format signature, protocol classification

中图分类号: