Journal of Computer Applications

    Next Articles

Source code vulnerability detection method based on Transformer-GCN

  

  • Received:2024-07-17 Revised:2024-10-31 Online:2024-12-03 Published:2024-12-03

基于Transformer-GCN的源代码漏洞检测方法

梁辰1,王奕森1,魏强2,杜江1   

  1. 1. 信息工程大学
    2. 信息工程大学 网络空间安全学院,郑州 450002
  • 通讯作者: 梁辰

Abstract: Existing deep learning-based methods for source code vulnerability detection often suffered from significant loss of syntax and semantics in target code, and neural network models allocated weights to the graph nodes and edges suboptimally. To address these issues, a novel method VulATGCN was proposed utilizing the Code Property Graph (CPG) and adaptive Transformer-Graph Convolutional Networks (AT-GCN) for detecting source code vulnerabilities. Initially, the CPG was used to represent source code, combined with CodeBERT for node vectorization. Graph centrality analysis was employed to extract deep structural features, enabling multi-dimensional capture of the code’s syntax and semantic information. Finally, AT-GCN model was designed to integrated the strengths of Transformer’s self-attention mechanism, which excels at capturing long-range dependencies, and Graph Convolutional Network (GCN), which is proficient at capturing local features. This design allowed for fusion learning and precise extraction of features from different important regions. Experiments conducted on real vulnerability datasets such as Big-Vul and SARD show that the F1 score reaches 82.9%, representing an average improvement of approximately 52.9% compared to deep learning-based vulnerability detection methods like VulSniper, VulMPFF, and MGVD.

Key words: code vulnerability detection, code property graph, graph neural network, centrality analysis, self-attention mechanism

摘要: 针对现有基于深度学习的源代码漏洞检测方法存在目标代码语法和语义缺失严重、神经网络模型对目标代码图点(边)权重分配不合理等问题,提出了一种基于代码属性图(CPG)与自适应图神经网络(AT-GCN)的源代码漏洞检测方法VulATGCN。VulATGCN首先使用CPG对源代码进行表征,结合CodeBERT进行节点向量化,并通过图中心性分析提取深层次结构特征,多维度捕捉代码的语法和语义信息。最后结合Transformer自注意力机制善于捕捉长距离依赖关系和图卷积网络(GCN)善于捕捉局部特征的优势,设计AT-GCN模型,实现对不同重要性区域特征的融合学习和精确提取。在真实漏洞数据集Big-Vul和SARD上进行实验,结果表明,F1分数达到82.9%,较于VulSniper、VulMPFF和MGVD等基于深度学习的漏洞检测方法,平均提高了约52.9%。

关键词: 源码漏洞检测, 代码属性图, 图神经网络, 中心性分析, 自注意力机制

CLC Number: