《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (10): 3170-3176.DOI: 10.11772/j.issn.1001-9081.2021081548

• 计算机软件技术 • 上一篇    

基于深度语义融合的代码缺陷静态检测方法

程靖云, 王布宏, 罗鹏   

  1. 空军工程大学 信息与导航学院,西安 710077
  • 收稿日期:2021-08-31 修回日期:2021-11-20 接受日期:2021-11-21 发布日期:2022-01-07 出版日期:2022-10-10
  • 通讯作者: 程靖云
  • 作者简介:第一联系人:程靖云(1998—),男,重庆人,硕士研究生,主要研究方向:信息安全; 1508458583@qq.com
    王布宏(1975—),男,山西太原人,教授,博士,主要研究方向:信息安全、物理层安全、人工智能安全
    罗鹏(1995—),男,江苏盐城人,博士研究生,主要研究方向:信息安全。

Static code defect detection method based on deep semantic fusion

Jingyun CHENG, Buhong WANG, Peng LUO   

  1. College of Information and Navigation,Air Force Engineering University,Xi’an Shaanxi 710077,China
  • Received:2021-08-31 Revised:2021-11-20 Accepted:2021-11-21 Online:2022-01-07 Published:2022-10-10
  • Contact: Jingyun CHENG
  • About author:CHENG Jingyun, born in 1998, M. S. candidate. His research interests include information safety.
    WANG Buhong, born in 1975, Ph. D. , professor. His research interests include information safety, physical layer security, artificial intelligence security.
    LUO Peng, born in 1995, Ph. D. candidate. His research interests include information safety.

摘要:

随着计算机软件规模和复杂度的不断增加,软件中存在的代码缺陷对公共安全形成了严重威胁。针对静态分析工具拓展性差,以及现有方法检测粒度粗、检测效果不够理想的问题,提出了一种基于程序切片和语义特征融合的代码缺陷静态检测方法。首先,对源代码中的关键点进行数据流和控制流分析,并采用基于过程间有限分布子集(IFDS)的切片方法,以获取由多行与代码缺陷相关的语句组成的代码片段;然后,通过词嵌入法获取代码片段语义相关的向量表示,从而在保证准确率的同时选择合适的代码片段长度;最后,利用文本卷积神经网络(TextCNN)和双向门控循环单元(BiGRU)分别提取代码片段中的局部关键特征和上下文序列特征,并将所提方法用于检测切片级别的代码缺陷。实验结果表明,所提方法能够有效检测不同类型的代码缺陷,并且检测效果显著优于静态分析工具Flawfinder;在细粒度的前提下,IFDS切片方法能进一步提高F1值和准确率,分别达到了89.64%和92.08%;与现有的基于程序切片的方法相比,在关键点为应用程序编程接口(API)或变量时,所提方法的F1值分别达到89.69%、89.74%,准确率分别达到92.15%、91.98%。可见在不显著增加时间复杂度的同时,所提方法具备更好的综合检测性能。

关键词: 缺陷检测, 程序切片, 语义分析, 深度学习, 特征融合

Abstract:

With the increasing scale and complexity of computer softwares, code defect in software has become a serious threat to public safety. Aiming at the problems of poor expansibility of static analysis tools, as well as coarse detection granularity and unsatisfactory detection effect of existing methods, a static code defect detection method based on program slicing and semantic feature fusion was proposed. Firstly, key points in source code were analyzed through data flow and control flow, and the program slicing method based on Interprocedural Finite Distributive Subset (IFDS) was adopted to obtain the code snippet composed of multiple lines of statements related to code defects. Then, semantically related vector representation of code snippet was obtained by word embedding, so that the appropriate length of code snippet was selected with the accuracy guaranteed. Finally, Text Convolutional Neural Network (TextCNN) and Bi-directional Gate Recurrent Unit (BiGRU) were used to extract local key features and context sequence features of the code snippet respectively, and the proposed method was used to detect slice-level code defects. Experimental results show that the proposed method can detect different types of code defects effectively, and is significantly better than static analysis tool Flawfinder. Under the premise of fine granularity, IFDS slicing method can further improve F1 score and accuracy,reach 89.64% and 92.08% respectively. Compared with the existing methods based on program slicing, when key points are the Application Programming Interface (API) or the variables, the proposed method has the F1 score reached 89.69% and 89.74% respectively, and the accuracy reached 92.15% and 91.98% respectively, and all of them are higher. It can be seen that without significantly increasing time complexity, the proposed method has a better comprehensive detection performance.

Key words: defect detection, program slicing, semantic analysis, deep learning, feature fusion

中图分类号: