Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (4): 1248-1258.DOI: 10.11772/j.issn.1001-9081.2023040551

• Computer software technology • Previous Articles     Next Articles

Survey of code similarity detection technology

Xiangjie SUN1,2, Qiang WEI2, Yisen WANG2(), Jiang DU2   

  1. 1.School of Cyber Science and Engineering,Zhengzhou University,Zhengzhou Henan 450002,China
    2.School of Cyberspace Security,Information Engineering University,Zhengzhou Henan 450001,China
  • Received:2023-05-09 Revised:2023-07-13 Accepted:2023-07-14 Online:2023-12-04 Published:2024-04-10
  • Contact: Yisen WANG
  • About author:SUN Xiangjie, born in 1999, M. S. candidate. His research interests include software composition analysis.
    WEI Qiang, born in 1979, Ph. D., professor. His research interests include industrial control system security.
    WANG Yisen, born in 1990, Ph. D., associate professor. His research interests include network security.
    DU Jiang, born in 1990, Ph. D. candidate. His research interests include binary code similarity.
  • Supported by:
    National Key Research & Development Program(2019QY0502)


孙祥杰1,2, 魏强2, 王奕森2(), 杜江2   

  1. 1.郑州大学 网络空间安全学院,郑州 450002
    2.信息工程大学 网络空间安全学院,郑州 450001
  • 通讯作者: 王奕森
  • 作者简介:孙祥杰(1999—),男,河南焦作人,硕士研究生,主要研究方向:软件成分分析
  • 基金资助:


Code reuse not only brings convenience to software development, but also introduces security risks, such as accelerating vulnerability propagation and malicious code plagiarism. Code similarity detection technology is to calculate code similarity by analyzing lexical, syntactic, semantic and other information between codes. It is one of the most effective technologies to judge code reuse, and it is also a program security analysis technology that has developed rapidly in recent years. First, the latest technical progress of code similarity detection was systematically reviewed, and the current code similarity detection technology was classified. According to whether the target code was open source, it was divided into source code similarity detection and binary code similarity detection. According to the different programming languages and instruction sets, the second subdivision was carried out. Then, the ideas and research results of each technology were summarized, the successful cases of machine learning technology in the field of code similarity detection were analyzed, and the advantages and disadvantages of existing technologies were discussed. Finally, the development trend of code similarity detection technology was given to provide reference for relevant researchers.

Key words: binary code similarity, source code similarity, cross language code similarity, deep learning, code clone



关键词: 二进制代码相似性, 源代码相似性, 跨语言代码相似性, 深度学习, 代码克隆

CLC Number: