基于Tsransformer-GCN的源代码漏洞检测方法

doi:10.11772/j.issn.1001-9081.2024070998

《计算机应用》唯一官方网站 ›› 2025, Vol. 45 ›› Issue (7): 2296-2303.DOI: 10.11772/j.issn.1001-9081.2024070998

基于Tsransformer-GCN的源代码漏洞检测方法

梁辰, 王奕森(), 魏强, 杜江

信息工程大学网络空间安全学院，郑州 450001

收稿日期:2024-07-17 修回日期:2024-10-31 接受日期:2024-10-31 发布日期:2025-07-10 出版日期:2025-07-10
通讯作者: 王奕森
作者简介:梁辰（2000—），男，安徽合肥人，硕士研究生，主要研究方向：软件成分分析
魏强（1979—），男，江西南昌人，教授，博士，主要研究方向：软件安全、工业控制系统安全
杜江（1990—），男，河南郑州人，博士研究生，主要研究方向：二进制代码相似性。
基金资助:
河南省重点研发专项(221111210300)

Source code vulnerability detection method based on Transformer-GCN

Chen LIANG, Yisen WANG(), Qiang WEI, Jiang DU

School of Cyberspace Security，Information Engineering University，Zhengzhou Henan 450001，China

Received:2024-07-17 Revised:2024-10-31 Accepted:2024-10-31 Online:2025-07-10 Published:2025-07-10
Contact: Yisen WANG
About author:LIANG Chen， born in 2000， M. S. candidate. His research interests include software component analysis.
WEI Qiang， born in 1979， Ph. D.， professor. His research interests include software security， industrial control system security.
DU Jiang， born in 1990， Ph. D. candidate. His research interests include binary code similarity.
Supported by:
Henan Province Key Research and Development Program(221111210300)

摘要/Abstract

摘要：

针对现有的基于深度学习的源代码漏洞检测方法存在目标代码语法和语义缺失严重以及神经网络模型对目标代码图点（边）权重分配不合理等问题，提出一种基于代码属性图（CPG）与自适应图卷积网络（AT-GCN）的源代码漏洞检测方法VulATGCN。该方法使用CPG对源代码进行表征，结合CodeBERT进行节点向量化，并通过图中心性分析提取深层次结构特征，从而多维度地捕捉代码的语法和语义信息。之后，结合Transformer自注意力机制善于捕捉长距离依赖关系和图卷积网络（GCN）善于捕捉局部特征的优势设计AT-GCN模型，从而实现对不同重要性区域特征的融合学习和精确提取。在真实漏洞数据集Big-Vul和SARD上的实验结果表明，所提方法VulATGCN的平均F1分数达到了82.9%，相较于VulSniper、VulMPFF和MGVD等基于深度学习的漏洞检测方法提高了10.4%~132.9%，平均提高约52.9%。

关键词: 源代码漏洞检测, 代码属性图, 图神经网络, 中心性分析, 自注意力机制

Abstract:

The existing deep learning-based methods for source code vulnerability detection often suffer from severe loss of syntax and semantics in target code， and neural network models allocating weights to the graph nodes （edges） in target code unreasonably. To address these issues， a method named VulATGCN for detecting source code vulnerabilities was proposed on the basis of Code Property Graph （CPG） and Adaptive Transformer-Graph Convolutional Network （AT-GCN）. In the method， CPG was used to represent source code， CodeBERT was combined for node vectorization， and graph centrality analysis was employed to extract deep structural features， thereby capturing the code’s syntax and semantic information in multi-dimensional way. After that， AT-GCN model was designed by integrating strengths of Transformer-based self-attention mechanism， which excels at capturing long-range dependencies， and Graph Convolutional Network （GCN）， which is proficient at capturing local features， thereby realizing fusion learning and precise extraction of features from regions with different importance. Experimental results on real vulnerability datasets Big-Vul and SARD show that the proposed method VulATGCN achieves an average F1 score of 82.9%， which is 10.4% to 132.9% higher than deep learning-based vulnerability detection methods such as VulSniper， VulMPFF， and MGVD， with an average increase of approximately 52.9%.

Key words: code vulnerability detection, Code Property Graph (CPG), Graph Neural Network (GNN), centrality analysis, self-attention mechanism

中图分类号:

TP311

梁辰, 王奕森, 魏强, 杜江. 基于Tsransformer-GCN的源代码漏洞检测方法[J]. 计算机应用, 2025, 45(7): 2296-2303.

Chen LIANG, Yisen WANG, Qiang WEI, Jiang DU. Source code vulnerability detection method based on Transformer-GCN[J]. Journal of Computer Applications, 2025, 45(7): 2296-2303.

图/表 7

参考文献 24

[1]	PLATE H， PONTA S E， SABETTA A. Impact assessment for vulnerabilities in open-source software libraries ［C］// Proceedings of the 2015 IEEE International Conference on Software Maintenance and Evolution. Piscataway IEEE， 2015： 411-420.
[2]	CROFT R， BABAR M A， KHOLOOSI M M. Data quality for software vulnerability datasets ［C］// Proceedings of the IEEE/ACM 45th International Conference on Software Engineering. Piscataway： IEEE， 2023： 121-133.
[3]	CAO X， WANG J， WU P， et al. VulMPFF： a vulnerability detection method for fusing code features in multiple perspectives ［J］. IET Information Security， 2024， 2024： No.4313185.
[4]	胡雨涛，王溯远，吴月明，等.基于图神经网络的切片级漏洞检测及解释方法［J］.软件学报，2023， 34（6）： 2543-2561.
	HU Y T， WANG S Y， WU Y M， et al. Slice-level vulnerability detection and interpretation method based on graph neural network ［J］. Journal of Software， 2023， 34（6）： 2543-2561.
[5]	YAMAGUCHI F， GOLDE N， ARP D， et al. Modeling and discovering vulnerabilities with code property graphs ［C］// Proceedings of the 2014 IEEE Symposium on Security and Privacy. Piscataway： IEEE， 2014： 590-604.
[6]	FENG Z， GUO D， TANG D， et al. CodeBERT： a pre-trained model for programming and natural languages ［C］// Findings of the Association for Computational Linguistics： EMNLP 2020. Stroudsburg： ACL， 2020： 1536-1547.
[7]	FAN J， LI Y， WANG S， et al. A C/C++ code vulnerability dataset with code changes and CVE summaries ［C］// Proceedings of the IEEE/ACM 17th International Conference on Mining Software Repositories. New York： ACM， 2020： 508-512.
[8]	SAMATE. NIST Software assurance reference dataset ［DS/OL］. ［2024-06-12］. .
[9]	HIN D， KAN A， CHEN H， et al. LineVD： statement-level vulnerability detection using graph neural networks ［C］// Proceedings of the 19th International Conference on Mining Software Repositories. New York： ACM， 2022： 596-607.
[10]	QIU F， LIU Z， HU X， et al. Vulnerability detection via multiple-graph-based code representation ［J］. IEEE Transactions on Software Engineering， 2024， 50（8）： 2178-2199.
[11]	LI Z， ZOU D， XU S. VulDeePecker： a deep learning-based system for vulnerability detection ［C］// Proceedings of the 2018 Network and Distributed Systems Security Symposium. Reston， VA： Internet Society， 2018： 1-15.
[12]	ZOU D， WANG S， XU S， et al. μVulDeePecker： a deep learning-based system for multiclass vulnerability detection ［J］. IEEE Transactions on Dependable and Secure Computing， 2021， 18（5）： 2224-2236.
[13]	DAM H K， TRAN T， PHAM T， et al. Automatic feature learning for predicting vulnerable software components ［J］. IEEE Transactions on Software Engineering， 2021， 47（1）： 67-85.
[14]	LI X， XIN Y， ZHU H， et al. Cross-domain vulnerability detection using graph embedding and domain adaptation ［J］. Computers and Security， 2023， 125： No.103017.
[15]	CHENG X， WANG H， HUA J， et al. DeepWukong： statically detecting software vulnerabilities using deep graph neural network ［J］. ACM Transactions on Software Engineering and Methodology， 2021， 30（3）： No.38.
[16]	Ltd Checkmarx. Checkmarx ［EB/OL］. ［2024-03-19］. .
[17]	WHEELER D A. Flawfinder ［EB/OL］. ［2024-02-20］. .
[18]	Secure Software Inc. Rough Audit Tool For Security （RATS）［EB/OL］. ［2024-03-19］. .
[19]	FU M， TANTITHAMTHAVORN C. LineVul： a Transformer-based line-level vulnerability prediction ［C］// Proceedings of the 19th International Conference on Mining Software Repositories. New York： ACM， 2022： 608-620.
[20]	PORNPRASIT C， TANTITHAMTHAVORN C K. DeepLineDP： towards a deep learning approach for line-level defect prediction ［J］. IEEE Transactions on Software Engineering， 2023， 49（1）： 84-98.
[21]	DUAN X， WU J Z， JI S， et al. VulSniper： focus your attention to shoot fine-grained vulnerabilities ［C］// Proceedings of the 28th International Joint Conference on Artificial Intelligence. California： ijcai.org， 2019： 4665-4671.
[22]	DOYLE M， WALDEN J. An empirical study of the evolution of PHP Web application security ［C］// Proceedings of the 3rd International Workshop on Security Measurements and Metrics. Piscataway： IEEE， 2011： 11-20.
[23]	McCABE T J. A complexity measure ［J］. IEEE Transactions on Software Engineering， 1976， SE-2（4）： 308-320.
[24]	NAGAPPAN N， BALL T. Use of relative code churn measures to predict system defect density ［C］// Proceedings of the 27th International Conference on Software Engineering. New York： ACM， 2005： 284-292.

超参数		取值	梯度
GCN	hidden_channels	［64，512］	64
	num_layers	［2，5］	1
	dropout	［0.1，0.5］	0.1
自适应注意力卷积层	hidden_channels	［64，512］	64
	num_layers	［2，5］	1
	dropout	［0.1，0.5］	0.1
	num_heads	［1，8］	1
	alpha	［0.2，0.8］	0.1
	graph_weight	［0.2，0.8］	0.1
	aggregate	add， cat	—
Adam学习率		10^-4，10^-3，10^-2	—

超参数		取值	梯度
GCN	hidden_channels	［64，512］	64
	num_layers	［2，5］	1
	dropout	［0.1，0.5］	0.1
自适应注意力卷积层	hidden_channels	［64，512］	64
	num_layers	［2，5］	1
	dropout	［0.1，0.5］	0.1
	num_heads	［1，8］	1
	alpha	［0.2，0.8］	0.1
	graph_weight	［0.2，0.8］	0.1
	aggregate	add， cat	—
Adam学习率		10^-4，10^-3，10^-2	—

方法	准确率	精确率	召回率	F1分数
Checkmarx^［16］	30.1	42.1	26.3	33.0
FlawFinder^［17］	38.2	39.9	30.5	34.8
RATS^［18］	39.3	40.8	32.9	35.7
LineVul^［19］	56.2	61.7	68.2	64.8
LineVD^［9］	59.6	63.4	75.8	69.3
DeepLineDP^［20］	54.7	63.2	69.4	66.2
VulSniper^［21］	59.0	60.4	71.2	65.5
VulMPFF^［3］	94.2	30.7	42.4	35.6
MGVD^［10］	83.1	24.1	74.2	36.4
文献［4］方法	72.9	70.5	80.3	75.1
VulATGCN	68.5	70.7	98.0	82.9

方法	准确率	精确率	召回率	F1分数
Checkmarx^［16］	30.1	42.1	26.3	33.0
FlawFinder^［17］	38.2	39.9	30.5	34.8
RATS^［18］	39.3	40.8	32.9	35.7
LineVul^［19］	56.2	61.7	68.2	64.8
LineVD^［9］	59.6	63.4	75.8	69.3
DeepLineDP^［20］	54.7	63.2	69.4	66.2
VulSniper^［21］	59.0	60.4	71.2	65.5
VulMPFF^［3］	94.2	30.7	42.4	35.6
MGVD^［10］	83.1	24.1	74.2	36.4
文献［4］方法	72.9	70.5	80.3	75.1
VulATGCN	68.5	70.7	98.0	82.9

基于Tsransformer-GCN的源代码漏洞检测方法

Source code vulnerability detection method based on Transformer-GCN

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 7

参考文献 24

相关文章 15

编辑推荐

Metrics

[1]	王义, 马应龙. 基于项图动态适应性生成的多任务社交项推荐方法[J]. 《计算机应用》唯一官方网站, 2025, 45(8): 2592-2599.
[2]	蒋权, 黄文清, 苟志勇. 基于等变图神经网络的拉格朗日粒子流模拟[J]. 《计算机应用》唯一官方网站, 2025, 45(8): 2666-2671.
[3]	涂银川, 郭勇, 毛恒, 任怡, 张建锋, 李宝. 基于分布式环境的图神经网络模型训练效率与训练性能评估[J]. 《计算机应用》唯一官方网站, 2025, 45(8): 2409-2420.
[4]	赵彪, 秦玉华, 田荣坤, 胡月航, 陈芳锐. 依赖类型及距离增强的方面级情感分析模型[J]. 《计算机应用》唯一官方网站, 2025, 45(8): 2507-2514.
[5]	张子墨, 赵雪专. 多尺度稀疏图引导的视觉图神经网络[J]. 《计算机应用》唯一官方网站, 2025, 45(7): 2188-2194.
[6]	陈丹阳, 张长伦. 多尺度去相关的图卷积网络模型[J]. 《计算机应用》唯一官方网站, 2025, 45(7): 2180-2187.
[7]	张悦岚, 苏静, 赵航宇, 杨白利. 基于知识感知与交互的多视图蒸馏推荐算法[J]. 《计算机应用》唯一官方网站, 2025, 45(7): 2211-2220.
[8]	赵小强, 柳勇勇, 惠永永, 刘凯. 基于改进时域卷积网络与多头自注意力机制的间歇过程质量预测模型[J]. 《计算机应用》唯一官方网站, 2025, 45(7): 2245-2252.
[9]	姜超英, 李倩, 刘宁, 刘磊, 崔立真. 基于图对比学习的再入院预测模型[J]. 《计算机应用》唯一官方网站, 2025, 45(6): 1784-1792.
[10]	李慧, 贾炳志, 王晨曦, 董子宇, 李纪龙, 仲兆满, 陈艳艳. 基于Swin Transformer的生成对抗网络水下图像增强模型[J]. 《计算机应用》唯一官方网站, 2025, 45(5): 1439-1446.
[11]	田仁杰, 景明利, 焦龙, 王飞. 基于混合负采样的图对比学习推荐算法[J]. 《计算机应用》唯一官方网站, 2025, 45(4): 1053-1060.
[12]	姜坤元, 李小霞, 王利, 曹耀丹, 张晓强, 丁楠, 周颖玥. 引入解耦残差自注意力的边界交叉监督语义分割网络[J]. 《计算机应用》唯一官方网站, 2025, 45(4): 1120-1129.
[13]	游兰, 张雨昂, 刘源, 陈智军, 王伟, 曾星, 何张玮. 基于协作贡献网络的开源项目开发者推荐[J]. 《计算机应用》唯一官方网站, 2025, 45(4): 1213-1222.
[14]	党伟超, 温鑫瑜, 高改梅, 刘春霞. 基于多视图多尺度对比学习的图协同过滤[J]. 《计算机应用》唯一官方网站, 2025, 45(4): 1061-1068.
[15]	王聪, 史艳翠. 基于多视角学习的图神经网络群组推荐模型[J]. 《计算机应用》唯一官方网站, 2025, 45(4): 1205-1212.