Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (4): 1259-1268.DOI: 10.11772/j.issn.1001-9081.2023040485
• Computer software technology • Previous Articles Next Articles
Zexuan WAN, Chunli XIE(), Quanrun LYU, Yao LIANG
Received:
2023-04-26
Revised:
2023-07-04
Accepted:
2023-07-10
Online:
2023-12-04
Published:
2024-04-10
Contact:
Chunli XIE
About author:
WAN Zexuan, born in 1998, M. S. candidate. His research interests include code representation, code clone.Supported by:
通讯作者:
谢春丽
作者简介:
万泽轩(1998—),男,江苏徐州人,硕士研究生,主要研究方向:代码表征、代码克隆基金资助:
CLC Number:
Zexuan WAN, Chunli XIE, Quanrun LYU, Yao LIANG. Code clone detection based on dependency enhanced hierarchical abstract syntax tree[J]. Journal of Computer Applications, 2024, 44(4): 1259-1268.
万泽轩, 谢春丽, 吕泉润, 梁瑶. 基于依赖增强的分层抽象语法树的代码克隆检测[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1259-1268.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2023040485
克隆类型 | 所占百分比 |
---|---|
Type-1(T1) | 0.46 |
Type-2(T2) | 0.06 |
Strongly Type-3(ST3) | 0.24 |
Moderately Type-3(MT3) | 1.01 |
Weakly Type-3/Type-4(WT3/T4) | 98.23 |
Tab. 1 Percentage points of different clone types in BCB dataset
克隆类型 | 所占百分比 |
---|---|
Type-1(T1) | 0.46 |
Type-2(T2) | 0.06 |
Strongly Type-3(ST3) | 0.24 |
Moderately Type-3(MT3) | 1.01 |
Weakly Type-3/Type-4(WT3/T4) | 98.23 |
语句 | BCB中平均出现次数 | GCJ中平均出现次数 |
---|---|---|
IfStatement | 2.72 | 3.11 |
WhileStatement | 0.44 | 0.44 |
ForStatement | 0.42 | 4.06 |
BlockStatement | 3.27 | 7.05 |
SwitchStatement | 0.01 | 0.01 |
Tab. 2 Average occurrences of statements in BCB and GCJ datasets
语句 | BCB中平均出现次数 | GCJ中平均出现次数 |
---|---|---|
IfStatement | 2.72 | 3.11 |
WhileStatement | 0.44 | 0.44 |
ForStatement | 0.42 | 4.06 |
BlockStatement | 3.27 | 7.05 |
SwitchStatement | 0.01 | 0.01 |
方法 | T1 | T2 | ST3 | MT3 | WT3/T4 |
---|---|---|---|---|---|
Deckard | 0.73 | 0.71 | 0.54 | 0.21 | 0.03 |
RtvNN | 1.00 | 0.97 | 0.60 | 0.03 | 0.01 |
CDLH | 1.00 | 0.99 | 0.97 | 0.94 | 0.82 |
ASTNN | 1.00 | 1.00 | 0.99 | 0.98 | 0.93 |
FA-AST | 1.00 | 1.00 | 0.99 | 0.99 | 0.95 |
Amain | 1.00 | 1.00 | 1.00 | 0.99 | 0.95 |
TreeCen | 1.00 | 0.99 | 1.00 | 1.00 | 0.95 |
DEHAST | 1.00 | 1.00 | 0.99 | 0.99 | 0.97 |
Tab. 3 F1 scores for each clone type on BCB dataset
方法 | T1 | T2 | ST3 | MT3 | WT3/T4 |
---|---|---|---|---|---|
Deckard | 0.73 | 0.71 | 0.54 | 0.21 | 0.03 |
RtvNN | 1.00 | 0.97 | 0.60 | 0.03 | 0.01 |
CDLH | 1.00 | 0.99 | 0.97 | 0.94 | 0.82 |
ASTNN | 1.00 | 1.00 | 0.99 | 0.98 | 0.93 |
FA-AST | 1.00 | 1.00 | 0.99 | 0.99 | 0.95 |
Amain | 1.00 | 1.00 | 1.00 | 0.99 | 0.95 |
TreeCen | 1.00 | 0.99 | 1.00 | 1.00 | 0.95 |
DEHAST | 1.00 | 1.00 | 0.99 | 0.99 | 0.97 |
方法 | Precision | Recall | F1分数 |
---|---|---|---|
Deckard | 0.93 | 0.02 | 0.03 |
RtvNN | 0.95 | 0.01 | 0.02 |
CDLH | 0.92 | 0.74 | 0.82 |
ASTNN | 0.92 | 0.94 | 0.93 |
FA-AST | 0.96 | 0.94 | 0.95 |
Amain | 0.95 | 0.96 | 0.95 |
TreeCen | 0.97 | 0.93 | 0.95 |
DEHAST | 0.98 | 0.96 | 0.97 |
Tab. 4 Performance of different mehods on BCB dataset
方法 | Precision | Recall | F1分数 |
---|---|---|---|
Deckard | 0.93 | 0.02 | 0.03 |
RtvNN | 0.95 | 0.01 | 0.02 |
CDLH | 0.92 | 0.74 | 0.82 |
ASTNN | 0.92 | 0.94 | 0.93 |
FA-AST | 0.96 | 0.94 | 0.95 |
Amain | 0.95 | 0.96 | 0.95 |
TreeCen | 0.97 | 0.93 | 0.95 |
DEHAST | 0.98 | 0.96 | 0.97 |
方法 | Precision | Recall | F1分数 |
---|---|---|---|
Deckard | 0.45 | 0.44 | 0.44 |
RtvNN | 0.20 | 0.90 | 0.33 |
CDLH | 0.82 | 0.66 | 0.73 |
ASTNN | 0.87 | 0.95 | 0.91 |
Amain | 0.93 | 0.91 | 0.92 |
TreeCen | 0.93 | 0.94 | 0.93 |
DEHAST | 0.93 | 0.97 | 0.95 |
Tab. 5 Performance of different methods on GCJ dataset
方法 | Precision | Recall | F1分数 |
---|---|---|---|
Deckard | 0.45 | 0.44 | 0.44 |
RtvNN | 0.20 | 0.90 | 0.33 |
CDLH | 0.82 | 0.66 | 0.73 |
ASTNN | 0.87 | 0.95 | 0.91 |
Amain | 0.93 | 0.91 | 0.92 |
TreeCen | 0.93 | 0.94 | 0.93 |
DEHAST | 0.93 | 0.97 | 0.95 |
序号 | 消融描述 | Precision | Recall | F1分数 |
---|---|---|---|---|
1 | 原始AST及其依赖 | 0.88 | 0.91 | 0.89 |
2 | 去掉语句层相关依赖的DEHAST | 0.91 | 0.94 | 0.92 |
3 | 去掉子树层相关依赖的DEHAST | 0.89 | 0.86 | 0.87 |
4 | 标准模型DEHAST | 0.98 | 0.96 | 0.97 |
Tab. 6 Results of ablation experiments
序号 | 消融描述 | Precision | Recall | F1分数 |
---|---|---|---|---|
1 | 原始AST及其依赖 | 0.88 | 0.91 | 0.89 |
2 | 去掉语句层相关依赖的DEHAST | 0.91 | 0.94 | 0.92 |
3 | 去掉子树层相关依赖的DEHAST | 0.89 | 0.86 | 0.87 |
4 | 标准模型DEHAST | 0.98 | 0.96 | 0.97 |
1 | ROY C K, CORDY J R. A survey on software clone detection research: Technical Report No. 2007-541[R]. Ontario, Canada: Queen’s University at Kingston, School of Computing, 2007: 64-68. |
2 | KAMIYA T, KUSUMOTO S, INOUE K. CCFinder: a multilinguistic token-based code clone detection system for large scale source code[J]. IEEE Transactions on Software Engineering, 2002, 28(7):654-670. 10.1109/tse.2002.1019480 |
3 | LIU C, CHEN C, HAN J, et al. GPLAG: detection of software plagiarism by program dependence graph analysis[C]// Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2006:872-881. 10.1145/1150402.1150522 |
4 | WANG P, SVAJLENKO J, WU Y, et al. CCAligner: a token based large-gap clone detector[C]// Proceedings of the 40th International Conference on Software Engineering. New York: ACM, 2018:1066-1077. 10.1145/3180155.3180179 |
5 | GABEL M, JIANG L, SU Z. Scalable detection of semantic clones[C]// Proceedings of the 30th International Conference on Software Engineering. New York: ACM, 2008: 321-330. 10.1145/1368088.1368132 |
6 | ZHANG J, WANG X, ZHANG H, et al. A novel neural source code representation based on abstract syntax tree[C]// Proceedings of the 41st International Conference on Software Engineering. Piscataway: IEEE, 2019: 783-794. 10.1109/icse.2019.00086 |
7 | WHITE M, TUFANO M, VENDOME C, et al. Deep learning code fragments for code clone detection[C]// Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. New York: ACM, 2016: 87-98. 10.1145/2970276.2970326 |
8 | WANG M, WANG P, XU Y. CCSharp: an efficient three-phase code clone detector using modified PDGs[C]// Proceedings of the 2017 24th Asia-Pacific Software Engineering Conference. Piscataway: IEEE, 2017:100-109. 10.1109/apsec.2017.16 |
9 | WEI H-H, LI M. Supervised deep features for software functional clone detection by exploiting lexical and syntactical information in source code[C]// Proceedings of the 26th International Joint Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2017: 3034-3040. 10.24963/ijcai.2017/423 |
10 | TAI K S, SOCHER R, MANNING C D. Improved semantic representations from tree-structured long short-term memory networks[C]// Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. Stroudsberg: ACL, 2015:1556-1566. 10.3115/v1/p15-1150 |
11 | WANG W, LI G, MA B, et al. Detecting code clones with graph neural network and flow-augmented abstract syntax tree[C]// Proceedings of the 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering. Piscataway: IEEE, 2020: 261-271. 10.1109/saner48275.2020.9054857 |
12 | ZHAO G, HUANG J. DeepSim: deep learning code functional similarity[C]// Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. New York: ACM, 2018: 141-151. 10.1145/3236024.3236068 |
13 | LI Y, GU C, DULLIEN T, et al. Graph matching networks for learning the similarity of graph structured objects[C]// Proceedings of the 36th International Conference on Machine Learning. New York: JMLR.org, 2019: 3835-3845. |
14 | 乐乔艺,刘建勋,孙晓平,等.代码克隆检测研究进展综述[J].计算机科学,2021,48(11A):509-522. 10.11896/jsjkx.210300310 |
LE Q Y, LIU J X, SUN X P, et al. Survey of research progress of code cloning detection[J]. Computer Science,2021,48(11A):509-522. 10.11896/jsjkx.210300310 | |
15 | DUCASSE S, RIEGER M, DEMEYER S. A language independent approach for detecting duplicated code[C]// Proceedings of the 15th IEEE International Conference on Software Maintenance. Piscataway: IEEE, 1999:109-118. 10.1109/icsm.1999.792593 |
16 | RAGKHITWETSAGUL C, KRINKE J. Using compilation/decompilation to enhance clone detection[C]// Proceedings of the 2017 IEEE 11th International Workshop on Software Clones. Piscataway: IEEE, 2017:1-7. 10.1109/iwsc.2017.7880502 |
17 | ROY C K, CORDY J R. NICAD: accurate detection of near-miss intentional clones using flexible pretty-printing and code normalization[C]// Proceedings of the 2008 IEEE 16th IEEE International Conference on Program Comprehension. Piscataway: IEEE, 2008: 172-181. 10.1109/icpc.2008.41 |
18 | NISHI M A, DAMEVSKI K. Scalable code clone detection and search based on adaptive prefix filtering[J]. Journal of Systems and Software, 2018, 137:130-142. 10.1016/j.jss.2017.11.039 |
19 | LI L, FENG H, ZHUANG W, et al. CCLearner: a deep learning-based clone detection approach[C]// Proceedings of the 2017 IEEE International Conference on Software Maintenance and Evolution. Piscataway: IEEE, 2017: 249-260. 10.1109/icsme.2017.46 |
20 | SAJNANI H, SAINI V, SVAJLENKO J, et al. SourcererCC: scaling code clone detection to big-code[C]// Proceedings of the 2016 IEEE/ACM 38th International Conference on Software Engineering. Washington, DC: IEEE Computer Society, 2016:1157-1168. 10.1145/2884781.2884877 |
21 | SVAJLENKO J, ROY C K. Fast and flexible large-scale clone detection with CloneWorks[C]// Proceedings of the 39th International Conference on Software Engineering Companion. Piscataway: IEEE, 2017:27-30. 10.1109/icse-c.2017.3 |
22 | CHODAREV S, PIETRIKOVÁ E, KOLLÁR J. Haskell clone detection using pattern comparing algorithm[C]// Proceedings of the 2015 13th International Conference on Engineering of Modern Electric Systems. Piscataway: IEEE, 2015:1-4. 10.1109/emes.2015.7158423 |
23 | YUAN D, FANG S, ZHANG T, et al. Java code clone detection by exploiting semantic and syntax information from intermediate code-based graph[J]. IEEE Transactions on Reliability, 2022, 72(2): 511-526. 10.1109/tr.2022.3176922 |
24 | HU Y, FANG Y, SUN Y, et al. Code2Img: tree-based image transformation for scalable code clone detection[J]. IEEE Transactions on Software Engineering, 2023, 49(9): 4429-4442. 10.1109/tse.2023.3295801 |
25 | MOU L, LI G, ZHANG L, et al. Convolutional neural networks over tree structures for programming language processing[C]// Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2016:1287-1293. 10.1609/aaai.v30i1.10139 |
26 | SCARSELLI F, GORI M, TSOI A C, et al. The graph neural network model[J]. IEEE Transactions on Neural Networks,2009,20(1):61-80. 10.1109/tnn.2008.2005605 |
27 | GILMER J, SCHOENHOLZ S S, RILEY P F, et al. Neural message passing for quantum chemistry[C]// Proceedings of the 34th International Conference on Machine Learning. New York: JMLR.org, 2017:1263-1272. |
28 | ALLAMANIS M, BROCKSCHMIDT M, KHADEMI M. Learning to represent programs with graphs[EB/OL]. [2022-07-30]. . |
29 | NGUYEN M, BUI N D Q. Learning to represent programs with code hierarchies[EB/OL]. (2022-05-31)[2022-07-30]. . 10.48550/arXiv.2205.15479 |
30 | SVAJLENKO J, ISLAM J F, KEIVANLOO I, et al. Towards a big data curated benchmark of inter-project code clones[C]// Proceedings of the 2014 IEEE International Conference on Software Maintenance and Evolution. Piscataway: IEEE, 2014:476-480. 10.1109/icsme.2014.77 |
31 | Google Code Jam[EB/OL]. [2022-05-26]. . 10.4337/9781781006481.00014 |
32 | FEY M, LENSSEN J E. Fast graph representation learning with PyTorch Geometric[EB/OL]. [2022-08-02]. . 10.1109/cvpr.2018.00097 |
33 | KINGMA D, BA J. Adam: a method for stochastic optimization[EB/OL]. [2023-04-01]. . |
34 | JIANG L, MISHERGHI G, SU Z, et al. DECKARD: scalable and accurate tree-based detection of code clones[C]// Proceedings of the 29th International Conference on Software Engineering. Piscataway: IEEE, 2007: 96-105. 10.1109/icse.2007.30 |
35 | WU Y, FENG S, ZOU D, et al. Detecting semantic code clones by building AST-based Markov chains model[C]// Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering. New York: ACM, 2022: No. 34. 10.1145/3551349.3560426 |
36 | HU Y, ZOU D, PENG J, et al. TreeCen: building tree graph for scalable semantic code clone detection[C]// Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering. New York: ACM, 2022: No. 109. 10.1145/3551349.3556927 |
[1] | Yunchuan HUANG, Yongquan JIANG, Juntao HUANG, Yan YANG. Molecular toxicity prediction based on meta graph isomorphism network [J]. Journal of Computer Applications, 2024, 44(9): 2964-2969. |
[2] | Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974. |
[3] | Xiyuan WANG, Zhancheng ZHANG, Shaokang XU, Baocheng ZHANG, Xiaoqing LUO, Fuyuan HU. Unsupervised cross-domain transfer network for 3D/2D registration in surgical navigation [J]. Journal of Computer Applications, 2024, 44(9): 2911-2918. |
[4] | Shunyong LI, Shiyi LI, Rui XU, Xingwang ZHAO. Incomplete multi-view clustering algorithm based on self-attention fusion [J]. Journal of Computer Applications, 2024, 44(9): 2696-2703. |
[5] | Yexin PAN, Zhe YANG. Optimization model for small object detection based on multi-level feature bidirectional fusion [J]. Journal of Computer Applications, 2024, 44(9): 2871-2877. |
[6] | Yuhan LIU, Genlin JI, Hongping ZHANG. Video pedestrian anomaly detection method based on skeleton graph and mixed attention [J]. Journal of Computer Applications, 2024, 44(8): 2551-2557. |
[7] | Yanjie GU, Yingjun ZHANG, Xiaoqian LIU, Wei ZHOU, Wei SUN. Traffic flow forecasting via spatial-temporal multi-graph fusion [J]. Journal of Computer Applications, 2024, 44(8): 2618-2625. |
[8] | Qianhong SHI, Yan YANG, Yongquan JIANG, Xiaocao OUYANG, Wubo FAN, Qiang CHEN, Tao JIANG, Yuan LI. Multi-granularity abrupt change fitting network for air quality prediction [J]. Journal of Computer Applications, 2024, 44(8): 2643-2650. |
[9] | Yiqun ZHAO, Zhiyu ZHANG, Xue DONG. Anisotropic travel time computation method based on dense residual connection physical information neural networks [J]. Journal of Computer Applications, 2024, 44(7): 2310-2318. |
[10] | Song XU, Wenbo ZHANG, Yifan WANG. Lightweight video salient object detection network based on spatiotemporal information [J]. Journal of Computer Applications, 2024, 44(7): 2192-2199. |
[11] | Xun SUN, Ruifeng FENG, Yanru CHEN. Monocular 3D object detection method integrating depth and instance segmentation [J]. Journal of Computer Applications, 2024, 44(7): 2208-2215. |
[12] | Zheng WU, Zhiyou CHENG, Zhentian WANG, Chuanjian WANG, Sheng WANG, Hui XU. Deep learning-based classification of head movement amplitude during patient anaesthesia resuscitation [J]. Journal of Computer Applications, 2024, 44(7): 2258-2263. |
[13] | Huanhuan LI, Tianqiang HUANG, Xuemei DING, Haifeng LUO, Liqing HUANG. Public traffic demand prediction based on multi-scale spatial-temporal graph convolutional network [J]. Journal of Computer Applications, 2024, 44(7): 2065-2072. |
[14] | Zhi ZHANG, Xin LI, Naifu YE, Kaixi HU. DKP: defending against model stealing attacks based on dark knowledge protection [J]. Journal of Computer Applications, 2024, 44(7): 2080-2086. |
[15] | Yajuan ZHAO, Fanjun MENG, Xingjian XU. Review of online education learner knowledge tracing [J]. Journal of Computer Applications, 2024, 44(6): 1683-1698. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||