基于关系图卷积网络的源代码漏洞检测

doi:10.11772/j.issn.1001-9081.2021091691

《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (6): 1814-1821.DOI: 10.11772/j.issn.1001-9081.2021091691

所属专题：第十八届CCF中国信息系统及应用大会

• 第十八届CCF中国信息系统及应用大会 • 上一篇下一篇

基于关系图卷积网络的源代码漏洞检测

文敏¹^,², 王荣存¹^,²^,³(), 姜淑娟¹^,²

^1.矿山数字化教育部工程研究中心(中国矿业大学), 江苏徐州 221116
^2.中国矿业大学计算机科学与技术学院, 江苏徐州 221116
^3.高安全系统的软件开发与验证技术工业和信息化部重点实验室(南京航空航天大学), 南京 211106

收稿日期:2021-09-29 修回日期:2021-11-16 接受日期:2021-11-17 发布日期:2022-04-15 出版日期:2022-06-10
通讯作者: 王荣存
作者简介:文敏（1996—），女，湖南邵东人，硕士研究生，主要研究方向：漏洞检测
姜淑娟（1966—），女，山东莱阳人，教授，博士，CCF会员，主要研究方向：软件分析与测试、编译技术。
基金资助:
国家自然科学基金资助项目(61673384);江苏省自然科学基金资助项目(BK20181353);高安全系统的软件开发与验证技术工业和信息化部重点实验室开放基金资助项目(1015-56XCA18164)

Source code vulnerability detection based on relational graph convolution network

Min WEN¹^,², Rongcun WANG¹^,²^,³(), Shujuan JIANG¹^,²

^1.Engineering Research Center of Mine Digitalization，Ministry of Education （China University of Mining and Technology），Xuzhou Jiangsu 221116，China
^2.School of Computer Science and Technology，China University of Mining and Technology，Xuzhou Jiangsu 221116，China
^3.Key Laboratory of Safety?Critical Software，Ministry of Industry and Information Technology （Nanjing University of Aeronautics and Astronautics），Nanjing Jiangsu 211106，China.

Received:2021-09-29 Revised:2021-11-16 Accepted:2021-11-17 Online:2022-04-15 Published:2022-06-10
Contact: Rongcun WANG
About author:WEN Min，born in 1996，M. S. candidate. Her research interests include vulnerability detection.
JIANG Shujuan，born in 1966，Ph. D.，professor. Her research interests include software analysis and testing，compilation technology
Supported by:
National Natural Science Foundation of China(61673384);Natural Science Foundation of Jiangsu Province(BK20181353);Open Fund of Key Laboratory of Safety-Critical Software, Ministry of Industry and Information Technology(1015-56XCA18164)

摘要/Abstract

摘要：

软件安全的根源在于软件开发人员开发的源代码，但随着软件规模和复杂性不断提高，仅靠人工检测漏洞代价高昂且难以扩展，而现有的代码分析工具有较高的误报率与漏报率。为此，提出一种基于关系图卷积网络（RGCN）的自动化漏洞检测方法以进一步提高漏洞检测的精度。首先将程序源代码转换为包含语法、语义特征信息的CPG；然后使用RGCN对图结构进行表示学习；最后训练神经网络模型预测程序源代码中的漏洞。为验证所提方法的有效性，在真实的软件漏洞样本上开展了实验验证，结果表明所提方法的漏洞检测结果的召回率和F1值分别达到了80.27%和63.78%。与Flawfinder、VulDeepecker和基于图卷积网络（GCN）的同类方法相比，所提方法的F1值分别提高了182%、12%和55%，可见所提方法能有效提高漏洞检测能力。

关键词: 漏洞检测, 代码属性图, 关系图卷积网络, 深度学习, 预测模型

Abstract:

The root cause of software security lies in the source code developed by software developers， but with the continues increasing size and complexity of software， it is costly and difficult to perform vulnerability detection only manually， while the existing code analysis tools have high false positive rate and false negative rate. Therefore， an automatic vulnerability detection method based on Relational Graph Convolution Network （RGCN） was proposed to further improve the accuracy of vulnerability detection. Firstly， the program source code was transformed into CPG containing syntax and semantic information. Then， representation learning was performed to the graph structure by RGCN. Finally， a neural network model was trained to predict the vulnerabilities in the program source code. To verify the effectiveness of the proposed method， an experimental validation was conducted on the real-world software vulnerability samples， and the results show that the recall and F1-measure of vulnerability detection results of the proposed method reach 80.27% and 63.78% respectively. Compared with Flawfinder， VulDeepecker and similar method based on Graph Convolution Network （GCN）， the proposed method has the F1-measure increased by 182%， 12% and 55% respectively. It can be seen that the proposed method can effectively improve the vulnerability detection capability.

Key words: vulnerability detection, Code Property Graph (CPG), Relational Graph Convolution Network (RGCN), deep learning, prediction model

中图分类号:

TP311

文敏, 王荣存, 姜淑娟. 基于关系图卷积网络的源代码漏洞检测[J]. 计算机应用, 2022, 42(6): 1814-1821.

Min WEN, Rongcun WANG, Shujuan JIANG. Source code vulnerability detection based on relational graph convolution network[J]. Journal of Computer Applications, 2022, 42(6): 1814-1821.

图/表 9

图1 代码的图表示

Fig. 1 Graph representation of code

图2 代码属性图

Fig. 2 Code property graph

图3 RGCN模型

Fig. 3 RGCN model

图4 本文方法框架

Fig. 4 Framework of the proposed method

表1 边类型

Tab. 1 Edge types

类型	边类型
数据依赖边	DEF
	USE
	REACHES
控制依赖边	FLOWS_TO
控制依赖边	CONTROLS

表2 漏洞数据集统计

Tab. 2 Statistics of vulnerability dataset

开源库	Vul	Non-vul
总计	12 460	14 858
FFmpeg	4 981	4 788
QEMU	7 479	10 070

表3 不同方法生成的样本集

Tab. 3 Datasets generated by different methods

方法	Vul	Non-vul	合计
VulDeepecker	9 117	9 875	18 992
文献［20］的方法	5 921	8 129	14 050
本文方法	11 905	9 713	21 618

表4 不同方法的性能对比 ( %)

Tab. 4 Performance comparison of different methods

方法	$A c c$	$P r e$	$R e$	$F 1$
Flawfinder	54.33	49.77	14.65	22.64
VulDeepecker	58.57	53.60	62.73	57.18
CDT+GCN	54.41	45.18	37.94	41.25
CDT+RGCN	56.40	48.70	63.41	55.09
Joern+GCN	52.20	46.74	45.02	45.86
Joern+RGCN（本文）	58.99	52.91	80.27	63.78

表4 不同方法的性能对比 ( %)

Tab. 4 Performance comparison of different methods

方法	$A c c$	$P r e$	$R e$	$F 1$
Flawfinder	54.33	49.77	14.65	22.64
VulDeepecker	58.57	53.60	62.73	57.18
CDT+GCN	54.41	45.18	37.94	41.25
CDT+RGCN	56.40	48.70	63.41	55.09
Joern+GCN	52.20	46.74	45.02	45.86
Joern+RGCN（本文）	58.99	52.91	80.27	63.78

图5 ROC曲线

Fig. 5 Curve of ROC

参考文献 30

1	吴世忠，郭涛，董国伟，等. 软件漏洞分析技术进展［J］. 清华大学学报（自然科学版）， 2012， 52（10）： 1309-1319.
	WU S Z， GUO T， DONG G W， et al. Software vulnerability analyses： a road map［J］. Journal of Tsinghua University （Science and Technology）， 2012， 52（10）： 1309-1319.
2	李舟军，张俊贤，廖湘科，等. 软件安全漏洞检测技术［J］. 计算机学报， 2015， 38（4）： 717-732. 10.3724/SP.J.1016.2015.00717
	LI Z J， ZHANG J X， LIAO X K， et al. Survey of software vulnerability detection techniques［J］. Chinese Journal of Computers， 2015， 38（4）： 717-732. 10.3724/SP.J.1016.2015.00717
3	李珍，邹德清，王泽丽，等. 面向源代码的软件漏洞静态检测综述［J］. 网络与信息安全学报， 2019， 5（1）： 1-14. 10.11959/j.issn.2096-109x.2019001
	LI Z， ZOU D Q， WANG Z L， et al. Survey on static software vulnerability detection for source code［J］. Chinese Journal of Network and Information Security， 2019， 5（1）： 1-14. 10.11959/j.issn.2096-109x.2019001
4	CADAR C， DUNBAR D， ENGLER D. KLEE： unassisted and automatic generation of high-coverage tests for complex systems programs［C］// Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation. Berkeley： USENIX Association， 2008： 209-224.
5	CHIPOUNOV V， KUZNETSOV V， CANDEA G. S2E： a platform for in-vivo multi-path analysis of software systems［C］// Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems. New York： ACM， 2011： 265-278. 10.1145/1950365.1950396
6	BALDONI R， COPPA E， D’ELIA D C， et al. A survey of symbolic execution techniques［J］. ACM Computing Surveys， 2018， 51（3）： No.50. 10.1145/3182657
7	GODEFROID P， LEVIN M Y， MOLNAR D A. Automated whitebox fuzz testing［C/OL］// Proceedings of the 2008 Network and Distributed System Security Symposium. ［2021-03-14］. . 10.1145/2093548.2093564
8	李韵，黄辰林，王中锋，等. 基于机器学习的软件漏洞挖掘方法综述［J］. 软件学报， 2020， 31（7）： 2040-2061. 10.13328/j.cnki.jos.006055
	LI Y， HUANG C L， WANG Z F， et al. Survey of software vulnerability mining methods based on machine learning［J］. Journal of Software， 2020， 31（7）： 2040-2061. 10.13328/j.cnki.jos.006055
9	孙鸿宇，何远，王基策，等. 人工智能技术在安全漏洞领域的应用［J］. 通信学报， 2018， 39（8）： 1-17. 10.11959/j.issn.1000-436x.2018137
	SUN H Y， HE Y， WANG J C， et al. Application of artificial intelligence technology in the field of security vulnerability［J］. Journal on Communications， 2018， 39（8）： 1-17. 10.11959/j.issn.1000-436x.2018137
10	SHIN Y， WILLIAMS L. Can traditional fault prediction models be used for vulnerability prediction？［J］. Empirical Software Engineering， 2013， 18（1）： 25-59. 10.1007/s10664-011-9190-8
11	YOUNIS A， MALAIYA Y， ANDERSON C， et al. To fear or not to fear that is the question： code characteristics of a vulnerable function with an existing exploit［C］// Proceedings of the 6th ACM Conference on Data and Application Security and Privacy. New York： ACM， 2016： 97-104. 10.1145/2857705.2857750
12	WALDEN J， STUCKMAN J， SCANDARIATO R. Predicting vulnerable components： software metrics vs text mining［C］// Proceedings of the IEEE 25th International Symposium on Software Reliability Engineering. Piscataway： IEEE， 2014： 23-33. 10.1109/issre.2014.32
13	SHIN Y， MENEELY A， WILLIAMS L， et al. Evaluating complexity， code churn， and developer activity metrics as indicators of software vulnerabilities［J］. IEEE Transactions on Software Engineering， 2011， 37（6）： 772-787. 10.1109/tse.2010.81
14	BOSU A， CARVER J， HAFIZ M， et al. Identifying the characteristics of vulnerable code changes： an empirical study［C］// Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. New York： ACM， 2014： 257-268. 10.1145/2635868.2635880
15	王飞雪，李芳. 基于激活漏洞能力条件的软件漏洞自动分类框架［J］. 重庆理工大学学报（自然科学版）， 2019， 33（5）： 154-160.
	WANG F X， LI F. Software vulnerability automatic classification framework based on activation vulnerability conditions［J］. Journal of Chongqing University of Technology （Natural Science）， 2019， 33（5）： 154-160.
16	YAMAGUCHI F， LOTTMANN M， RIECK K. Generalized vulnerability extrapolation using abstract syntax trees［C］// Proceedings of the 28th Annual Computer Security Applications Conference. New York： ACM， 2012： 359-368. 10.1145/2420950.2421003
17	RUSSELL R， KIM L， HAMILTON L， et al. Automated vulnerability detection in source code using deep representation learning［C］// Proceedings of the 17th IEEE International Conference on Machine Learning and Applications. Piscataway： IEEE， 2018： 757-762. 10.1109/icmla.2018.00120
18	DUAN X， WU J Z， JI S L， et al. VulSniper： focus your attention to shoot fine-grained vulnerabilities［C］// Proceedings of the 28th International Joint Conference on Artificial Intelligence. California： ijcai.org， 2019： 4665-4671. 10.24963/ijcai.2019/648
19	LI Z， ZOU D Q， XU S H， et al. VulDeePecker： a deep learning-based system for vulnerability detection［C/OL］// Proceedings of the 2018 Network and Distributed Systems Security Symposium. ［2021-03-14］. . 10.14722/ndss.2018.23158
20	孔维星，叶贵鑫，王焕廷，等.一种基于图卷积网络的源代码漏洞检测方法：中国， 202010168037.0［P］. 2020-07-28.
	KONG W X， YE G X， WANG H T， et al. A source code vulnerability detection method based on graph convolution network： CN， 202010168037.0［P］. 2020-07-28.
21	YAMAGUCHI F， GOLDE N， ARP D， et al. Modeling and discovering vulnerabilities with code property graphs［C］// Proceedings of the 2014 IEEE Symposium on Security and Privacy. Piscataway： IEEE， 2014： 590-604. 10.1109/sp.2014.44
22	MOONEN L. Generating robust parsers using island grammars［C］// Proceedings of the 8th Working Conference on Reverse Engineering. Piscataway： IEEE， 2001： 13-22. 10.1109/wcre.2001.957806
23	KIPF T N， WELLING M. Semi-supervised classification with graph convolutional networks［EB/OL］. （2017-02-22）［2021-04-14］..
24	GILMER J， SCHOENHOLZ S S， RILEY P E， et al. Neural message passing for quantum chemistry［C］// Proceedings of the 34th International Conference on Machine Learning. New York： JMLR.org， 2017： 1263-1272.
25	SCHLICHTKRULL M， KIPF T N， BLOEM P， et al. Modeling relational data with graph convolutional networks［C］// Proceedings of the 2018 European Semantic Web Conference， LNCS 10843/LNISA 10843. Cham： Springer， 2018： 593-607. 10.1007/978-3-319-93417-4_38
26	LE Q， MIKOLOV T. Distributed representations of sentences and documents［C］// Proceedings of the 31st International Conference on Machine Learning. New York： JMLR.org， 2014： 1188-1196.
27	ISPIROVA G， EFTIMOV T， SELJAK B K. Comparing semantic and nutrient value similarities of recipes［C］// Proceedings of the 2019 IEEE International Conference on Big Data. Piscataway： IEEE， 2019： 5131-5139. 10.1109/bigdata47090.2019.9006080
28	ZHOU Y Q， LIU S Q， SIOW J， et al. Devign： effective vulnerability identification by learning comprehensive program semantics via graph neural networks［C/OL］// Proceedings of the 33rd Conference on Neural Information Processing Systems. ［2021-01-12］..
29	ZOU D Q， WANG S J， XU S H， et al. μVulDeePecker： a deep learning-based system for multiclass vulnerability detection［J］. IEEE Transactions on Dependable and Secure Computing， 2021， 18（5）： 2224-2236.
30	LI Z， ZOU D Q， XU S H， et al. SySeVR： a framework for using deep learning to detect software vulnerabilities［J］. IEEE Transactions on Dependable and Secure Computing， 2021（Early Access）： 3051525. 10.1109/tdsc.2021.3051525

[1]	潘烨新, 杨哲. 基于多级特征双向融合的小目标检测优化模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2871-2877.
[2]	李顺勇, 李师毅, 胥瑞, 赵兴旺. 基于自注意力融合的不完整多视图聚类算法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2696-2703.
[3]	秦璟, 秦志光, 李发礼, 彭悦恒. 基于概率稀疏自注意力神经网络的重性抑郁疾患诊断[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2970-2974.
[4]	王熙源, 张战成, 徐少康, 张宝成, 罗晓清, 胡伏原. 面向手术导航3D/2D配准的无监督跨域迁移网络[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2911-2918.
[5]	黄云川, 江永全, 黄骏涛, 杨燕. 基于元图同构网络的分子毒性预测[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2964-2969.
[6]	刘禹含, 吉根林, 张红苹. 基于骨架图与混合注意力的视频行人异常检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2551-2557.
[7]	顾焰杰, 张英俊, 刘晓倩, 周围, 孙威. 基于时空多图融合的交通流量预测[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2618-2625.
[8]	石乾宏, 杨燕, 江永全, 欧阳小草, 范武波, 陈强, 姜涛, 李媛. 面向空气质量预测的多粒度突变拟合网络[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2643-2650.
[9]	赵亦群, 张志禹, 董雪. 基于密集残差物理信息神经网络的各向异性旅行时计算方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2310-2318.
[10]	徐松, 张文博, 王一帆. 基于时空信息的轻量视频显著性目标检测网络[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2192-2199.
[11]	孙逊, 冯睿锋, 陈彦如. 基于深度与实例分割融合的单目3D目标检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2208-2215.
[12]	吴筝, 程志友, 汪真天, 汪传建, 王胜, 许辉. 基于深度学习的患者麻醉复苏过程中的头部运动幅度分类方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2258-2263.
[13]	李欢欢, 黄添强, 丁雪梅, 罗海峰, 黄丽清. 基于多尺度时空图卷积网络的交通出行需求预测[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2065-2072.
[14]	张郅, 李欣, 叶乃夫, 胡凯茜. 基于暗知识保护的模型窃取防御技术DKP[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2080-2086.
[15]	赵雅娟, 孟繁军, 徐行健. 在线教育学习者知识追踪综述[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1683-1698.

基于关系图卷积网络的源代码漏洞检测

Source code vulnerability detection based on relational graph convolution network

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 9

参考文献 30

相关文章 15

编辑推荐

Metrics