Source code vulnerability detection method based on Transformer-GCN

doi:10.11772/j.issn.1001-9081.2024070998

Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (7): 2296-2303.DOI: 10.11772/j.issn.1001-9081.2024070998

• Cyber security • Previous Articles Next Articles

Source code vulnerability detection method based on Transformer-GCN

Chen LIANG, Yisen WANG(), Qiang WEI, Jiang DU

School of Cyberspace Security，Information Engineering University，Zhengzhou Henan 450001，China

Received:2024-07-17 Revised:2024-10-31 Accepted:2024-10-31 Online:2025-07-10 Published:2025-07-10
Contact: Yisen WANG
About author:LIANG Chen， born in 2000， M. S. candidate. His research interests include software component analysis.
WANG Yisen， born in 1990， Ph. D.， associate professor. His research interests include cyber security.
WEI Qiang， born in 1979， Ph. D.， professor. His research interests include software security， industrial control system security.
DU Jiang， born in 1990， Ph. D. candidate. His research interests include binary code similarity.
Supported by:
Henan Province Key Research and Development Program(221111210300)

基于Tsransformer-GCN的源代码漏洞检测方法

梁辰, 王奕森(), 魏强, 杜江

信息工程大学网络空间安全学院，郑州 450001

通讯作者: 王奕森
作者简介:梁辰（2000—），男，安徽合肥人，硕士研究生，主要研究方向：软件成分分析
王奕森（1990—），男，河南沈丘人，副教授，博士，主要研究方向：网络安全 851067568@qq.com
魏强（1979—），男，江西南昌人，教授，博士，主要研究方向：软件安全、工业控制系统安全
杜江（1990—），男，河南郑州人，博士研究生，主要研究方向：二进制代码相似性。
基金资助:
河南省重点研发专项(221111210300)

Abstract

Abstract:

The existing deep learning-based methods for source code vulnerability detection often suffer from severe loss of syntax and semantics in target code， and neural network models allocating weights to the graph nodes （edges） in target code unreasonably. To address these issues， a method named VulATGCN for detecting source code vulnerabilities was proposed on the basis of Code Property Graph （CPG） and Adaptive Transformer-Graph Convolutional Network （AT-GCN）. In the method， CPG was used to represent source code， CodeBERT was combined for node vectorization， and graph centrality analysis was employed to extract deep structural features， thereby capturing the code’s syntax and semantic information in multi-dimensional way. After that， AT-GCN model was designed by integrating strengths of Transformer-based self-attention mechanism， which excels at capturing long-range dependencies， and Graph Convolutional Network （GCN）， which is proficient at capturing local features， thereby realizing fusion learning and precise extraction of features from regions with different importance. Experimental results on real vulnerability datasets Big-Vul and SARD show that the proposed method VulATGCN achieves an average F1 score of 82.9%， which is 10.4% to 132.9% higher than deep learning-based vulnerability detection methods such as VulSniper， VulMPFF， and MGVD， with an average increase of approximately 52.9%.

Key words: code vulnerability detection, Code Property Graph (CPG), Graph Neural Network (GNN), centrality analysis, self-attention mechanism

摘要：

针对现有的基于深度学习的源代码漏洞检测方法存在目标代码语法和语义缺失严重以及神经网络模型对目标代码图点（边）权重分配不合理等问题，提出一种基于代码属性图（CPG）与自适应图卷积网络（AT-GCN）的源代码漏洞检测方法VulATGCN。该方法使用CPG对源代码进行表征，结合CodeBERT进行节点向量化，并通过图中心性分析提取深层次结构特征，从而多维度地捕捉代码的语法和语义信息。之后，结合Transformer自注意力机制善于捕捉长距离依赖关系和图卷积网络（GCN）善于捕捉局部特征的优势设计AT-GCN模型，从而实现对不同重要性区域特征的融合学习和精确提取。在真实漏洞数据集Big-Vul和SARD上的实验结果表明，所提方法VulATGCN的平均F1分数达到了82.9%，相较于VulSniper、VulMPFF和MGVD等基于深度学习的漏洞检测方法提高了10.4%~132.9%，平均提高约52.9%。

关键词: 源代码漏洞检测, 代码属性图, 图神经网络, 中心性分析, 自注意力机制

CLC Number:

TP311

Chen LIANG, Yisen WANG, Qiang WEI, Jiang DU. Source code vulnerability detection method based on Transformer-GCN[J]. Journal of Computer Applications, 2025, 45(7): 2296-2303.

梁辰, 王奕森, 魏强, 杜江. 基于Tsransformer-GCN的源代码漏洞检测方法[J]. 《计算机应用》唯一官方网站, 2025, 45(7): 2296-2303.

Figures/Tables 7

References 24

[1]	PLATE H， PONTA S E， SABETTA A. Impact assessment for vulnerabilities in open-source software libraries ［C］// Proceedings of the 2015 IEEE International Conference on Software Maintenance and Evolution. Piscataway IEEE， 2015： 411-420.
[2]	CROFT R， BABAR M A， KHOLOOSI M M. Data quality for software vulnerability datasets ［C］// Proceedings of the IEEE/ACM 45th International Conference on Software Engineering. Piscataway： IEEE， 2023： 121-133.
[3]	CAO X， WANG J， WU P， et al. VulMPFF： a vulnerability detection method for fusing code features in multiple perspectives ［J］. IET Information Security， 2024， 2024： No.4313185.
[4]	胡雨涛，王溯远，吴月明，等.基于图神经网络的切片级漏洞检测及解释方法［J］.软件学报，2023， 34（6）： 2543-2561.
	HU Y T， WANG S Y， WU Y M， et al. Slice-level vulnerability detection and interpretation method based on graph neural network ［J］. Journal of Software， 2023， 34（6）： 2543-2561.
[5]	YAMAGUCHI F， GOLDE N， ARP D， et al. Modeling and discovering vulnerabilities with code property graphs ［C］// Proceedings of the 2014 IEEE Symposium on Security and Privacy. Piscataway： IEEE， 2014： 590-604.
[6]	FENG Z， GUO D， TANG D， et al. CodeBERT： a pre-trained model for programming and natural languages ［C］// Findings of the Association for Computational Linguistics： EMNLP 2020. Stroudsburg： ACL， 2020： 1536-1547.
[7]	FAN J， LI Y， WANG S， et al. A C/C++ code vulnerability dataset with code changes and CVE summaries ［C］// Proceedings of the IEEE/ACM 17th International Conference on Mining Software Repositories. New York： ACM， 2020： 508-512.
[8]	SAMATE. NIST Software assurance reference dataset ［DS/OL］. ［2024-06-12］. .
[9]	HIN D， KAN A， CHEN H， et al. LineVD： statement-level vulnerability detection using graph neural networks ［C］// Proceedings of the 19th International Conference on Mining Software Repositories. New York： ACM， 2022： 596-607.
[10]	QIU F， LIU Z， HU X， et al. Vulnerability detection via multiple-graph-based code representation ［J］. IEEE Transactions on Software Engineering， 2024， 50（8）： 2178-2199.
[11]	LI Z， ZOU D， XU S. VulDeePecker： a deep learning-based system for vulnerability detection ［C］// Proceedings of the 2018 Network and Distributed Systems Security Symposium. Reston， VA： Internet Society， 2018： 1-15.
[12]	ZOU D， WANG S， XU S， et al. μVulDeePecker： a deep learning-based system for multiclass vulnerability detection ［J］. IEEE Transactions on Dependable and Secure Computing， 2021， 18（5）： 2224-2236.
[13]	DAM H K， TRAN T， PHAM T， et al. Automatic feature learning for predicting vulnerable software components ［J］. IEEE Transactions on Software Engineering， 2021， 47（1）： 67-85.
[14]	LI X， XIN Y， ZHU H， et al. Cross-domain vulnerability detection using graph embedding and domain adaptation ［J］. Computers and Security， 2023， 125： No.103017.
[15]	CHENG X， WANG H， HUA J， et al. DeepWukong： statically detecting software vulnerabilities using deep graph neural network ［J］. ACM Transactions on Software Engineering and Methodology， 2021， 30（3）： No.38.
[16]	Ltd Checkmarx. Checkmarx ［EB/OL］. ［2024-03-19］. .
[17]	WHEELER D A. Flawfinder ［EB/OL］. ［2024-02-20］. .
[18]	Secure Software Inc. Rough Audit Tool For Security （RATS）［EB/OL］. ［2024-03-19］. .
[19]	FU M， TANTITHAMTHAVORN C. LineVul： a Transformer-based line-level vulnerability prediction ［C］// Proceedings of the 19th International Conference on Mining Software Repositories. New York： ACM， 2022： 608-620.
[20]	PORNPRASIT C， TANTITHAMTHAVORN C K. DeepLineDP： towards a deep learning approach for line-level defect prediction ［J］. IEEE Transactions on Software Engineering， 2023， 49（1）： 84-98.
[21]	DUAN X， WU J Z， JI S， et al. VulSniper： focus your attention to shoot fine-grained vulnerabilities ［C］// Proceedings of the 28th International Joint Conference on Artificial Intelligence. California： ijcai.org， 2019： 4665-4671.
[22]	DOYLE M， WALDEN J. An empirical study of the evolution of PHP Web application security ［C］// Proceedings of the 3rd International Workshop on Security Measurements and Metrics. Piscataway： IEEE， 2011： 11-20.
[23]	McCABE T J. A complexity measure ［J］. IEEE Transactions on Software Engineering， 1976， SE-2（4）： 308-320.
[24]	NAGAPPAN N， BALL T. Use of relative code churn measures to predict system defect density ［C］// Proceedings of the 27th International Conference on Software Engineering. New York： ACM， 2005： 284-292.

超参数		取值	梯度
GCN	hidden_channels	［64，512］	64
	num_layers	［2，5］	1
	dropout	［0.1，0.5］	0.1
自适应注意力卷积层	hidden_channels	［64，512］	64
	num_layers	［2，5］	1
	dropout	［0.1，0.5］	0.1
	num_heads	［1，8］	1
	alpha	［0.2，0.8］	0.1
	graph_weight	［0.2，0.8］	0.1
	aggregate	add， cat	—
Adam学习率		10^-4，10^-3，10^-2	—

超参数		取值	梯度
GCN	hidden_channels	［64，512］	64
	num_layers	［2，5］	1
	dropout	［0.1，0.5］	0.1
自适应注意力卷积层	hidden_channels	［64，512］	64
	num_layers	［2，5］	1
	dropout	［0.1，0.5］	0.1
	num_heads	［1，8］	1
	alpha	［0.2，0.8］	0.1
	graph_weight	［0.2，0.8］	0.1
	aggregate	add， cat	—
Adam学习率		10^-4，10^-3，10^-2	—

方法	准确率	精确率	召回率	F1分数
Checkmarx^［16］	30.1	42.1	26.3	33.0
FlawFinder^［17］	38.2	39.9	30.5	34.8
RATS^［18］	39.3	40.8	32.9	35.7
LineVul^［19］	56.2	61.7	68.2	64.8
LineVD^［9］	59.6	63.4	75.8	69.3
DeepLineDP^［20］	54.7	63.2	69.4	66.2
VulSniper^［21］	59.0	60.4	71.2	65.5
VulMPFF^［3］	94.2	30.7	42.4	35.6
MGVD^［10］	83.1	24.1	74.2	36.4
文献［4］方法	72.9	70.5	80.3	75.1
VulATGCN	68.5	70.7	98.0	82.9

方法	准确率	精确率	召回率	F1分数
Checkmarx^［16］	30.1	42.1	26.3	33.0
FlawFinder^［17］	38.2	39.9	30.5	34.8
RATS^［18］	39.3	40.8	32.9	35.7
LineVul^［19］	56.2	61.7	68.2	64.8
LineVD^［9］	59.6	63.4	75.8	69.3
DeepLineDP^［20］	54.7	63.2	69.4	66.2
VulSniper^［21］	59.0	60.4	71.2	65.5
VulMPFF^［3］	94.2	30.7	42.4	35.6
MGVD^［10］	83.1	24.1	74.2	36.4
文献［4］方法	72.9	70.5	80.3	75.1
VulATGCN	68.5	70.7	98.0	82.9

Source code vulnerability detection method based on Transformer-GCN

基于Tsransformer-GCN的源代码漏洞检测方法

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 7

References 24

Related Articles 15

Recommended Articles

Metrics

[1]	Danyang CHEN, Changlun ZHANG. Multi-scale decorrelation graph convolutional network model [J]. Journal of Computer Applications, 2025, 45(7): 2180-2187.
[2]	Yuelan ZHANG, Jing SU, Hangyu ZHAO, Baili YANG. Multi-view knowledge-aware and interactive distillation recommendation algorithm [J]. Journal of Computer Applications, 2025, 45(7): 2211-2220.
[3]	Xiaoqiang ZHAO, Yongyong LIU, Yongyong HUI, Kai LIU. Batch process quality prediction model using improved time-domain convolutional network with multi-head self-attention mechanism [J]. Journal of Computer Applications, 2025, 45(7): 2245-2252.
[4]	Zimo ZHANG, Xuezhuan ZHAO. Multi-scale sparse graph guided vision graph neural networks [J]. Journal of Computer Applications, 2025, 45(7): 2188-2194.
[5]	Hui LI, Bingzhi JIA, Chenxi WANG, Ziyu DONG, Jilong LI, Zhaoman ZHONG, Yanyan CHEN. Generative adversarial network underwater image enhancement model based on Swin Transformer [J]. Journal of Computer Applications, 2025, 45(5): 1439-1446.
[6]	Renjie TIAN, Mingli JING, Long JIAO, Fei WANG. Recommendation algorithm of graph contrastive learning based on hybrid negative sampling [J]. Journal of Computer Applications, 2025, 45(4): 1053-1060.
[7]	Kunyuan JIANG, Xiaoxia LI, Li WANG, Yaodan CAO, Xiaoqiang ZHANG, Nan DING, Yingyue ZHOU. Boundary-cross supervised semantic segmentation network with decoupled residual self-attention [J]. Journal of Computer Applications, 2025, 45(4): 1120-1129.
[8]	Cong WANG, Yancui SHI. Group recommendation model by graph neural network based on multi-perspective learning [J]. Journal of Computer Applications, 2025, 45(4): 1205-1212.
[9]	Lan YOU, Yuang ZHANG, Yuan LIU, Zhijun CHEN, Wei WANG, Xing ZENG, Zhangwei HE. Developer recommendation for open-source projects based on collaborative contribution network [J]. Journal of Computer Applications, 2025, 45(4): 1213-1222.
[10]	Weichao DANG, Xinyu WEN, Gaimei GAO, Chunxia LIU. Multi-view and multi-scale contrastive learning for graph collaborative filtering [J]. Journal of Computer Applications, 2025, 45(4): 1061-1068.
[11]	Handa MA, Yadong WU. Multi-domain spatiotemporal hierarchical graph neural network for air quality prediction [J]. Journal of Computer Applications, 2025, 45(2): 444-452.
[12]	Qijian CAI, Wei TAN. Semantic graph enhanced multi-modal recommendation algorithm [J]. Journal of Computer Applications, 2025, 45(2): 421-427.
[13]	Zidong CHENG, Peng LI, Feng ZHU. Potential relation mining in internet of things threat intelligence knowledge graph [J]. Journal of Computer Applications, 2025, 45(1): 24-31.
[14]	Wenbo ZHAO, Zitong MA, Zhe YANG. Link prediction model based on directed hypergraph adaptive convolution [J]. Journal of Computer Applications, 2025, 45(1): 15-23.
[15]	Tingjie TANG, Jiajin HUANG, Jin QIN. Session-based recommendation with graph auxiliary learning [J]. Journal of Computer Applications, 2024, 44(9): 2711-2718.