《计算机应用》唯一官方网站 ›› 2025, Vol. 45 ›› Issue (7): 2296-2303.DOI: 10.11772/j.issn.1001-9081.2024070998

• 网络空间安全 • 上一篇    下一篇

基于Tsransformer-GCN的源代码漏洞检测方法

梁辰, 王奕森(), 魏强, 杜江   

  1. 信息工程大学 网络空间安全学院,郑州 450001
  • 收稿日期:2024-07-17 修回日期:2024-10-31 接受日期:2024-10-31 发布日期:2025-07-10 出版日期:2025-07-10
  • 通讯作者: 王奕森
  • 作者简介:梁辰(2000—),男,安徽合肥人,硕士研究生,主要研究方向:软件成分分析
    王奕森(1990—),男,河南沈丘人,副教授,博士,主要研究方向:网络安全 851067568@qq.com
    魏强(1979—),男,江西南昌人,教授,博士,主要研究方向:软件安全、工业控制系统安全
    杜江(1990—),男,河南郑州人,博士研究生,主要研究方向:二进制代码相似性。
  • 基金资助:
    河南省重点研发专项(221111210300)

Source code vulnerability detection method based on Transformer-GCN

Chen LIANG, Yisen WANG(), Qiang WEI, Jiang DU   

  1. School of Cyberspace Security,Information Engineering University,Zhengzhou Henan 450001,China
  • Received:2024-07-17 Revised:2024-10-31 Accepted:2024-10-31 Online:2025-07-10 Published:2025-07-10
  • Contact: Yisen WANG
  • About author:LIANG Chen, born in 2000, M. S. candidate. His research interests include software component analysis.
    WANG Yisen, born in 1990, Ph. D., associate professor. His research interests include cyber security.
    WEI Qiang, born in 1979, Ph. D., professor. His research interests include software security, industrial control system security.
    DU Jiang, born in 1990, Ph. D. candidate. His research interests include binary code similarity.
  • Supported by:
    Henan Province Key Research and Development Program(221111210300)

摘要:

针对现有的基于深度学习的源代码漏洞检测方法存在目标代码语法和语义缺失严重以及神经网络模型对目标代码图点(边)权重分配不合理等问题,提出一种基于代码属性图(CPG)与自适应图卷积网络(AT-GCN)的源代码漏洞检测方法VulATGCN。该方法使用CPG对源代码进行表征,结合CodeBERT进行节点向量化,并通过图中心性分析提取深层次结构特征,从而多维度地捕捉代码的语法和语义信息。之后,结合Transformer自注意力机制善于捕捉长距离依赖关系和图卷积网络(GCN)善于捕捉局部特征的优势设计AT-GCN模型,从而实现对不同重要性区域特征的融合学习和精确提取。在真实漏洞数据集Big-Vul和SARD上的实验结果表明,所提方法VulATGCN的平均F1分数达到了82.9%,相较于VulSniper、VulMPFF和MGVD等基于深度学习的漏洞检测方法提高了10.4%~132.9%,平均提高约52.9%。

关键词: 源代码漏洞检测, 代码属性图, 图神经网络, 中心性分析, 自注意力机制

Abstract:

The existing deep learning-based methods for source code vulnerability detection often suffer from severe loss of syntax and semantics in target code, and neural network models allocating weights to the graph nodes (edges) in target code unreasonably. To address these issues, a method named VulATGCN for detecting source code vulnerabilities was proposed on the basis of Code Property Graph (CPG) and Adaptive Transformer-Graph Convolutional Network (AT-GCN). In the method, CPG was used to represent source code, CodeBERT was combined for node vectorization, and graph centrality analysis was employed to extract deep structural features, thereby capturing the code’s syntax and semantic information in multi-dimensional way. After that, AT-GCN model was designed by integrating strengths of Transformer-based self-attention mechanism, which excels at capturing long-range dependencies, and Graph Convolutional Network (GCN), which is proficient at capturing local features, thereby realizing fusion learning and precise extraction of features from regions with different importance. Experimental results on real vulnerability datasets Big-Vul and SARD show that the proposed method VulATGCN achieves an average F1 score of 82.9%, which is 10.4% to 132.9% higher than deep learning-based vulnerability detection methods such as VulSniper, VulMPFF, and MGVD, with an average increase of approximately 52.9%.

Key words: code vulnerability detection, Code Property Graph (CPG), Graph Neural Network (GNN), centrality analysis, self-attention mechanism

中图分类号: