《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (4): 985-998.DOI: 10.11772/j.issn.1001-9081.2021071267

• CCF第36届中国计算机应用大会 (CCF NCCA 2021) •    下一篇

二进制代码相似性搜索研究进展

夏冰1,2, 庞建民1(), 周鑫1, 单征1   

  1. 1.数学工程与先进计算国家重点实验室,郑州 450001
    2.中原工学院 前沿信息技术研究院,郑州 450007
  • 收稿日期:2021-07-15 修回日期:2021-08-23 接受日期:2021-08-30 发布日期:2021-08-23 出版日期:2022-04-10
  • 通讯作者: 庞建民
  • 作者简介:夏冰(1981—),男,河南永城人,副教授,博士研究生,CCF会员,主要研究方向:网络安全、逆向工程
    周鑫(1994—),男,辽宁沈阳人,博士研究生,主要研究方向:网络安全
    单征(1977—),男,辽宁沈阳人,教授,博士,CCF会员,主要研究方向:量子计算。
  • 基金资助:
    国家自然科学基金资助项目(61802435);之江实验室“先进工业互联网安全平台”项目(2018FD0ZX01);河南省高等学校重点科研项目(21A520054)

Research progress on binary code similarity search

Bing XIA1,2, Jianmin PANG1(), Xin ZHOU1, Zheng SHAN1   

  1. 1.Key Laboratory of Mathematical Engineering and Advanced Computing,Zhengzhou Henan 450001,China
    2.Frontier Information Technology Research Institute,Zhongyuan University of Technology,Zhengzhou Henan 450007,China
  • Received:2021-07-15 Revised:2021-08-23 Accepted:2021-08-30 Online:2021-08-23 Published:2022-04-10
  • Contact: Jianmin PANG
  • About author:XIA Bing, born in 1981, Ph. D. candidate, associate professor. His research interests include network cyberspace security, reverse engineering.
    ZHOU Xin, born in 1994, Ph. D. candidate. His research interests include network security.
    SHAN Zheng, born in 1977, Ph. D., professor. His research interests include quantum computation.
  • Supported by:
    National Natural Science Foundation of China(61802435);Advanced Industrial Internet Security Platform Program of Zhijiang Laboratory(2018FD0ZX01);Key Research Project of Henan Universities and Colleges(21A520054)

摘要:

随着物联网和工业互联网的快速发展,网络空间安全的研究日益受到工业界和学术界的重视。由于源代码无法获取,二进制代码相似性搜索成为漏洞挖掘和恶意代码分析的关键核心技术。首先,从二进制代码相似性搜索基本概念出发,给出二进制代码相似性搜索系统框架;然后,围绕相似性技术系统介绍二进制代码语法相似性搜索、语义相似性搜索和语用相似性搜索的发展现状;其次,从二进制哈希、指令序列、图结构、基本块语义、特征学习、调试信息恢复和函数高级语义识别等角度总结比较现有解决方案;最后,展望二进制代码相似性搜索未来发展方向与前景。

关键词: 二进制代码, 代码搜索, 代码比较, 基本块语义, 语义识别

Abstract:

With the rapid development of Internet of Things (IoT) and industrial Internet, the research of cyberspace security has been paid more and more attention by industry and academia. Because the source code cannot be obtained, binary code similarity search has become a key core technology for vulnerability mining and malware code analysis. Firstly, the basic concepts of binary code similarity search and the framework of binary code similarity search system were introduced. Secondly, the development status of binary code technology about syntax similarity search, semantic similarity search and pragmatic similarity search were discussed. Then, the existing solutions were summarized and compared from the perspectives of binary hash, instruction sequence, graph structure, basic block semantics, feature learning, debugging information recovery and advanced semantic recognition of functions. Finally, the future development direction and prospect of binary code similarity search were looked forward to.

Key words: binary code, code search, code comparison, basic block semantics, semantic recognition

中图分类号: