基于最佳路径搜索的二进制协议格式关键词边界确定方法

doi:10.11772/j.issn.1001-9081.2017112846

计算机应用 ›› 2018, Vol. 38 ›› Issue (6): 1726-1731.DOI: 10.11772/j.issn.1001-9081.2017112846

基于最佳路径搜索的二进制协议格式关键词边界确定方法

闫小勇, 李青

信息工程大学, 郑州 450001

收稿日期:2017-12-05 修回日期:2018-01-09 出版日期:2018-06-10 发布日期:2018-06-13
通讯作者: 闫小勇
作者简介:闫小勇(1993-),男,陕西陇县人,硕士研究生,主要研究方向:数据挖掘、协议逆向分析;李青(1976-),女,河北正定人,副教授,博士,主要研究方向:协议逆向分析、可见光通信、无线自组织网、传感网。

Method for determining boundaries of binary protocol format keywords based on optimal path search

YAN Xiaoyong, LI Qing

Information Engineering University, Zhengzhou Henan 450001, China

Received:2017-12-05 Revised:2018-01-09 Online:2018-06-10 Published:2018-06-13

摘要/Abstract

摘要： 针对二进制协议报文格式逆向分析中字段切分问题，提出以格式关键词为逆向分析目标，通过改进的n-gram算法和最佳路径搜索算法实现对二进制协议格式关键词的最优定界。首先，将位置因素引入n-gram算法，提出基于迭代n-gram-position的格式关键词边界提取算法，有效解决了n-gram算法中n值不易确定和固定偏移位置格式关键词的边界提取问题；然后，定义了频繁项边界命中率和左右分支信息熵为基础的分支度量，以关键词和非关键词的n-gram-position取值变化率存在差异为基础构造约束条件，提出基于最佳路径搜索的格式关键词边界选择算法，实现了对格式关键词的联合最优定界。在AIS1、AIS18、ICMP00、ICMP03和NetBios五种不同类型协议报文数据集上的测试结果表明，所提算法能够准确确定不同协议格式关键词的边界，F值均在83%以上。与VDV和AutoReEngine经典算法相比，所提算法的F值平均提升约8个百分点。

关键词: 二进制协议, 格式关键词, 边界确定, n-gram, 最佳路径搜索

Abstract: Aiming at the problem of field segmentation in the reverse analysis of binary protocol message format, a novel algorithm with format keywords as the reverse analysis target was proposed, which can optimally determine the boundaries of binary protocol format keywords by improved n-gram algorithm and optimal path search algorithm. Firstly, by introducing the position factor into n-gram algorithm, a boundary extraction algorithm of format keywords was proposed based on the iterative n-gram-position algorithm, which effectively solved the problems that the n value was difficult to determine and the candidate boundary extraction of format keywords with fixed offset position in the n-gram algorithm. Then, the branch metric was defined based on the hit ratio of frequent item boundaries and the left and right branch information entropies, and the constraint conditions were constructed based on the difference of n-gram-position value change rate between keywords and non-keywords. The boundary selection algorithm of format keywords based on the optimal path search was proposed to determine the joint optimal bound for format keywords. The experimental results of testing on five different types of protocol message datasets such as AIS1, AIS18, ICMP00, ICMP03 and NetBios show that, the proposed algorithm can accurately determine the boundaries of different protocol format keywords, its F values are all above 83%. Compared with the classical algorithms of Variance of the Distribution of Variances (VDV) and AutoReEngine, the F value of the proposed algorithm is increased averagely by about 8 percentage points.

Key words: binary protocol, format keyword, boundary determining, n-gram, optimal path search

中图分类号:

TP393

闫小勇, 李青. 基于最佳路径搜索的二进制协议格式关键词边界确定方法[J]. 计算机应用, 2018, 38(6): 1726-1731.

YAN Xiaoyong, LI Qing. Method for determining boundaries of binary protocol format keywords based on optimal path search[J]. Journal of Computer Applications, 2018, 38(6): 1726-1731.

参考文献

[1] TAO S Y, YU H Y, LI Q. Bit-oriented format extraction approach for automatic binary protocol reverse engineering[J]. IET Communications, 2016, 10(6):709-716.
[2] TONG L, LIU Y, ZHANG C R, et al. A noise-tolerant system for protocol formats extraction from binary data[C]//Proceedings of the 2014 IEEE Workshop on Advanced Research and Technology in Industry Applications. Piscataway, NJ:IEEE, 2014:862- 865.
[3] 孟凡治,李桐,刘渊,等.基于概率比对的通信协议格式逆向分析方法[J].计算机工程与设计,2016,37(9):2337-2341.(MENG F Z, LI T, LIU Y, et al. Format reverse method for communication protocol based on probability alignment[J]. Computer Engineering and Design, 2016, 37(9):2337-2341.)
[4] WANG Y P, YUN X C, SHAFIQ M Z, et al. A semantics aware approach to automated reverse engineering unknown protocols[C]//Proceedings of the 201220th IEEE International Conference on Network Protocols. Piscataway, NJ:IEEE, 2012:1-10.
[5] LUO J Z, YU S Z. Position-based automatic reverse engineering of network protocols[J]. Journal of Network and Computer Applications, 2013, 36(3):1070-1077.
[6] 黎敏,余顺争.抗噪的未知应用层协议报文格式最佳分段方法^*[J]. 软件学报,2013,24(3):604-617.(LI M, YU S Z. Noise-tolerant and optimal segmentation of message formats for unknown application-layer protocols[J]. Journal of Software, 2013, 24(3):604-617.)
[7] 吴礼发,洪征,潘璠.网络协议逆向分析及应用[M]. 北京:国防工业出版社,2016:63.(WU L F, HONG Z, PAN F. Network Protocol Reverse Analysis and Application[M]. Beijing:National Defense Industry Press, 2016:63.)
[8] 王变琴,余顺争.自适应网络应用特征发现方法[J].通信学报,2013,34(4):127-137.(WANG B Q, YU S Z. Adaptive extraction method of network application signatures[J]. Journal on Communications, 2013, 34(4):127-137.)
[9] 鲍琳.浅谈船载自动识别系统(AIS)[J].广船科技,2002(4):1-3.(BAO L. Brief discussion the shipborne automatic identification system[J]. GSI Shipbuilding Technology, 2002(4):1-3.)
[10] MA J, LEVCHENKO K, KREIBICH C, et al. Unexpected means of protocol inference[C]//Proceedings of the 20066th ACM SIGCOMM Conference on Internet Measurement. New York:ACM, 2006:313-326.
[11] BROWN P F, DESOUZA P V, MERCER R L, et al. Class-based n-gram models of natural language[J]. Computational Linguistics, 1992, 18(4):467-479.
[12] 范亮,王晓梅,杨东煜.一种利用最佳路径搜索的PDU容错定界算法[J].西安电子科技大学学报(自然科学版),2016,43(5):160-166.(FAN L, WANG X M, YANG D Y. Algorithm for error-tolerant delimitation for the protocol data unit based on best path searching[J]. Journal of Xidian University (Natural Science), 2016, 43(5):160-166.)
[13] POSTEL J. Internet control message protocol[EB/OL].[2017-10-16]. http://www.uni-obuda.hu/users/wuhrlt/MSC_targyak/rfc/rfc792.pdf.
[14] MCLAUGHLIN L J. Standard for the transmission of IP datagrams over NetBIOS networks[EB/OL].[2017-10-16]. http://www.rfc-editor.org/in-notes/pdfrfc/rfc1088.txt.pdf.
[15] TRIFILÒ A, BURSCHKA S, BIERSACK E. Traffic to protocol reverse engineering[C]//Proceedings of the Second IEEE International Conference on Computational Intelligence for Security and Sefense Applications. Piscataway, NJ:IEEE, 2009:257-264.

基于最佳路径搜索的二进制协议格式关键词边界确定方法

Method for determining boundaries of binary protocol format keywords based on optimal path search

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	陈港, 孟相如, 康巧燕, 阳勇. 基于拓扑分割与聚类分析的虚拟软件定义网络映射算法[J]. 《计算机应用》唯一官方网站, 2021, 41(11): 3309-3318.
[2]	毕文婷林海涛张立群. 基于多阶段演化信号博弈模型的移动目标防御决策算法[J]. 计算机应用, 0, (): 0-0.
[3]	朱玉娜, 张玉涛, 闫少阁, 范钰丹, 陈韩托. 基于半监督子空间聚类的协议识别方法[J]. 计算机应用, 2021, 41(10): 2900-2904.
[4]	杨书新许景峰. 基于反向影响采样的积极影响力最大化[J]. 计算机应用, 0, (): 0-0.
[5]	郭棉, 张锦友. 移动边缘计算环境中面向机器学习的计算迁移策略[J]. 计算机应用, 2021, 41(9): 2639-2645.
[6]	倪萍, 陈伟. 基于模糊测试的反射型跨站脚本漏洞检测[J]. 计算机应用, 2021, 41(9): 2594-2601.
[7]	曾续玲李陶深巩健杜利俊. 无线供能移动边缘计算系统的安全卸载优化[J]. 计算机应用, 0, (): 0-0.
[8]	谢家贵李志平金键. 基于星火区块链的跨链机制[J]. 计算机应用, 0, (): 0-0.
[9]	张立群林海涛郇文明毕文婷. 基于OpenFlow的软件定义网络流规则冲突检测系统的设计与仿真[J]. 计算机应用, 0, (): 0-0.
[10]	赖涵光李清江勇. 基于场景变化的传输控制协议拥塞控制切换方案[J]. 计算机应用, 0, (): 0-0.
[11]	陈葳葳, 曹利, 顾翔. 基于区块链的车联网电子取证模型[J]. 计算机应用, 2021, 41(7): 1989-1995.
[12]	肖跃雷, 邓小凡. 基于证书的有线局域网安全关联方案改进与分析[J]. 计算机应用, 2021, 41(7): 1970-1976.
[13]	邓伟健陈曦. 基于时变资源的容器化虚拟网络映射算法[J]. 计算机应用, 0, (): 0-0.
[14]	董文涛, 李卓, 陈昕. 基于联邦学习的在线短视频内容分发策略[J]. 计算机应用, 2021, 41(6): 1551-1556.
[15]	施安妮, 李陶深, 王哲, 何璐. 基于缓存辅助的全双工无线携能通信系统的中继选择策略[J]. 计算机应用, 2021, 41(6): 1539-1545.