Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (12): 3896-3902.DOI: 10.11772/j.issn.1001-9081.2022111783
• Computer software technology • Previous Articles Next Articles
Qihong SONG1,2, Jianxun LIU1,2(), Haize HU1,2, Xiangping ZHANG1,2
Received:
2022-11-29
Revised:
2023-03-25
Accepted:
2023-03-28
Online:
2023-05-08
Published:
2023-12-10
Contact:
Jianxun LIU
About author:
SONG Qihong, born in 1998, M. S. candidate. His research interests include code search, code completion.Supported by:
宋其洪1,2, 刘建勋1,2(), 扈海泽1,2, 张祥平1,2
通讯作者:
刘建勋
作者简介:
宋其洪(1998—),男,陕西宝鸡人,硕士研究生,CCF会员,主要研究方向:代码搜索、代码补全基金资助:
CLC Number:
Qihong SONG, Jianxun LIU, Haize HU, Xiangping ZHANG. Code search model based on collaborative fusion network[J]. Journal of Computer Applications, 2023, 43(12): 3896-3902.
宋其洪, 刘建勋, 扈海泽, 张祥平. 基于协同融合网络的代码搜索模型[J]. 《计算机应用》唯一官方网站, 2023, 43(12): 3896-3902.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2022111783
编程语言 | 代码-查询对数 | |||
---|---|---|---|---|
训练集 | 验证集 | 测试集 | 合计 | |
共计 | 1 880 853 | 89 154 | 100 529 | 2 070 536 |
Python | 412 178 | 23 107 | 22 176 | 457 461 |
Javascript | 123 889 | 8 253 | 6 483 | 138 625 |
Ruby | 48 791 | 2 209 | 2 279 | 53 279 |
Go | 317 832 | 14 242 | 14 291 | 346 365 |
Java | 454 451 | 15 328 | 26 909 | 496 688 |
PHP | 523 712 | 26 015 | 28 391 | 578 118 |
Tab.1 Details about CodeSearchNet corpus
编程语言 | 代码-查询对数 | |||
---|---|---|---|---|
训练集 | 验证集 | 测试集 | 合计 | |
共计 | 1 880 853 | 89 154 | 100 529 | 2 070 536 |
Python | 412 178 | 23 107 | 22 176 | 457 461 |
Javascript | 123 889 | 8 253 | 6 483 | 138 625 |
Ruby | 48 791 | 2 209 | 2 279 | 53 279 |
Go | 317 832 | 14 242 | 14 291 | 346 365 |
Java | 454 451 | 15 328 | 26 909 | 496 688 |
PHP | 523 712 | 26 015 | 28 391 | 578 118 |
模型 | SR@1 | SR@5 | SR@10 | MRR | NDCG |
---|---|---|---|---|---|
UNIF | 0.420 | 0.556 | 0.624 | 0.419 | 0.451 |
TabCS | 0.547 | 0.683 | 0.748 | 0.539 | 0.569 |
MRCS | 0.719 | 0.828 | 0.871 | 0.702 | 0.741 |
BofeCS | 0.848 | 0.942 | 0.972 | 0.821 | 0.857 |
Tab.2 Results of comparison experiment of four models on code search task
模型 | SR@1 | SR@5 | SR@10 | MRR | NDCG |
---|---|---|---|---|---|
UNIF | 0.420 | 0.556 | 0.624 | 0.419 | 0.451 |
TabCS | 0.547 | 0.683 | 0.748 | 0.539 | 0.569 |
MRCS | 0.719 | 0.828 | 0.871 | 0.702 | 0.741 |
BofeCS | 0.848 | 0.942 | 0.972 | 0.821 | 0.857 |
模型输入 | SR@1 | SR@5 | SR@10 | MRR | NDCG |
---|---|---|---|---|---|
T | 0.848 | 0.942 | 0.972 | 0.821 | 0.857 |
T + SBT | 0.844 | 0.935 | 0.966 | 0.820 | 0.855 |
T + LCRS | 0.517 | 0.719 | 0.812 | 0.509 | 0.579 |
T + RootPath | 0.769 | 0.883 | 0.925 | 0.748 | 0.790 |
T + LeafPath | 0.492 | 0.698 | 0.797 | 0.488 | 0.559 |
Tab.3 Influence of tree sequence on BofeCS performance
模型输入 | SR@1 | SR@5 | SR@10 | MRR | NDCG |
---|---|---|---|---|---|
T | 0.848 | 0.942 | 0.972 | 0.821 | 0.857 |
T + SBT | 0.844 | 0.935 | 0.966 | 0.820 | 0.855 |
T + LCRS | 0.517 | 0.719 | 0.812 | 0.509 | 0.579 |
T + RootPath | 0.769 | 0.883 | 0.925 | 0.748 | 0.790 |
T + LeafPath | 0.492 | 0.698 | 0.797 | 0.488 | 0.559 |
损失函数 | SR@1 | SR@5 | SR@10 | MRR | NDCG |
---|---|---|---|---|---|
S | 0.848 | 0.942 | 0.972 | 0.821 | 0.857 |
M | 0.825 | 0.931 | 0.967 | 0.712 | 0.774 |
Tab.4 Performance of two loss functions
损失函数 | SR@1 | SR@5 | SR@10 | MRR | NDCG |
---|---|---|---|---|---|
S | 0.848 | 0.942 | 0.972 | 0.821 | 0.857 |
M | 0.825 | 0.931 | 0.967 | 0.712 | 0.774 |
方法 | SR@1 | SR@5 | SR@10 | MRR | NDCG |
---|---|---|---|---|---|
最大池化 | 0.848 | 0.942 | 0.972 | 0.821 | 0.857 |
平均池化 | 0.449 | 0.811 | 0.960 | 0.476 | 0.587 |
Tab.5 Performance of two pooling operations
方法 | SR@1 | SR@5 | SR@10 | MRR | NDCG |
---|---|---|---|---|---|
最大池化 | 0.848 | 0.942 | 0.972 | 0.821 | 0.857 |
平均池化 | 0.449 | 0.811 | 0.960 | 0.476 | 0.587 |
模型 | SR@1 | SR@5 | SR@10 | MRR | NDCG |
---|---|---|---|---|---|
BofeCS | 0.848 | 0.942 | 0.972 | 0.821 | 0.857 |
BofeCS-协同融合网络 | 0.745 | 0.924 | 0.969 | 0.743 | 0.785 |
BofeCS-残差结构 | 0.716 | 0.927 | 0.979 | 0.668 | 0.743 |
BofeCS-Dropout结构 | 0.351 | 0.521 | 0.655 | 0.363 | 0.426 |
Tab.6 Results of ablation experiments
模型 | SR@1 | SR@5 | SR@10 | MRR | NDCG |
---|---|---|---|---|---|
BofeCS | 0.848 | 0.942 | 0.972 | 0.821 | 0.857 |
BofeCS-协同融合网络 | 0.745 | 0.924 | 0.969 | 0.743 | 0.785 |
BofeCS-残差结构 | 0.716 | 0.927 | 0.979 | 0.668 | 0.743 |
BofeCS-Dropout结构 | 0.351 | 0.521 | 0.655 | 0.363 | 0.426 |
编程语言 | SR@1 | SR@5 | SR@10 | MRR | NDCG |
---|---|---|---|---|---|
Python | 0.815 | 0.850 | 0.961 | 0.986 | 0.857 |
JavaScript | 0.811 | 0.847 | 0.961 | 0.987 | 0.854 |
Ruby | 0.677 | 0.692 | 0.838 | 0.901 | 0.729 |
Go | 0.861 | 0.869 | 0.933 | 0.973 | 0.887 |
Java | 0.821 | 0.848 | 0.942 | 0.972 | 0.857 |
PHP | 0.899 | 0.917 | 0.971 | 0.987 | 0.919 |
Tab.7 Performance of BofeCS in six languages
编程语言 | SR@1 | SR@5 | SR@10 | MRR | NDCG |
---|---|---|---|---|---|
Python | 0.815 | 0.850 | 0.961 | 0.986 | 0.857 |
JavaScript | 0.811 | 0.847 | 0.961 | 0.987 | 0.854 |
Ruby | 0.677 | 0.692 | 0.838 | 0.901 | 0.729 |
Go | 0.861 | 0.869 | 0.933 | 0.973 | 0.887 |
Java | 0.821 | 0.848 | 0.942 | 0.972 | 0.857 |
PHP | 0.899 | 0.917 | 0.971 | 0.987 | 0.919 |
1 | YAO Z, PEDDAMAIL J R, SUN H. CoaCor: code annotation for code retrieval with reinforcement learning [C]// Proceedings of the 2019 World Wide Web Conference. Republic and Canton of Geneva: International World Wide Web Conferences Steering Committee, 2019: 2203-2214. 10.1145/3308558.3313632 |
2 | WAN Y, SHU J, SUI Y, et al. Multi-modal attention network learning for semantic source code retrieval [C]// Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering. Piscataway: IEEE, 2019: 13-25. 10.1109/ase.2019.00012 |
3 | GU X, ZHANG H, KIM S. Deep code search [C]// Proceedings of the ACM/IEEE 40th International Conference on Software Engineering. New York: ACM, 2018: 933-944. 10.1145/3180155.3180167 |
4 | YU Z, YU J, XIANG C, et al. Beyond bilinear: generalized multimodal factorized high-order pooling for visual question answering [J]. IEEE Transactions on Neural Networks and Learning Systems, 2018, 29(12): 5947-5959. 10.1109/tnnls.2018.2817340 |
5 | LI L, DONG R, CHEN L. Context-aware co-attention neural network for service recommendations [C]// Proceedings of the IEEE 35th International Conference on Data Engineering Workshops. Piscataway: IEEE, 2019: 201-208. 10.1109/icdew.2019.00-11 |
6 | LI B, SUN Z, LI Q, et al. Group-wise deep object co-segmentation with co-attention recurrent neural network [C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 8518-8527. 10.1109/iccv.2019.00861 |
7 | HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 770-778. 10.1109/cvpr.2016.90 |
8 | HUSAIN H, WU HH, GAZIT T, et al. CodeSearchNet challenge: evaluating the state of semantic code search [EB/OL]. [2022-09-12].. |
9 | CAMBRONERO J, LI H, KIM S, et al. When deep learning met code search[C]// Proceedings of the 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. New York: ACM, 2019: 964-974. 10.1145/3338906.3340458 |
10 | XU L, YANG H, LIU C, et al. Two-stage attention-based model for code search with textual and structural features[C]// Proceedings of the 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering. Piscataway: IEEE, 2021: 342-353. 10.1109/saner50967.2021.00039 |
11 | GU J, CHEN Z, MONPERRUS M. Multimodal representation for neural code search[C]// Proceedings of the 2021 IEEE International Conference on Software Maintenance and Evolution. Piscataway: IEEE, 2021: 483-494. 10.1109/icsme52107.2021.00049 |
12 | LV F, ZHANG H, LOU J G, et al. CodeHow: effective code search based on API understanding and extended Boolean model (E)[C]// Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering. Piscataway: IEEE, 2015: 260-270. 10.1109/ase.2015.42 |
13 | LU M, SUN X, WANG S, et al. Query expansion via WordNet for effective code search[C]// Proceedings of the IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering. Piscataway: IEEE, 2015: 545-549. 10.1109/saner.2015.7081874 |
14 | LEMOS O A L, DE PAULA A C, ZANICHELLI F C, et al. Thesaurus-based automatic query expansion for interface-driven code search [C]// Proceedings of the 11th Working Conference on Mining Software Repositories. New York: ACM, 2014: 212-221. 10.1145/2597073.2597087 |
15 | LIU J, KIM S, MURALI V, et al. Neural query expansion for code search[C]// Proceedings of the 3rd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages. New York: ACM, 2019: 29-37. 10.1145/3315508.3329975 |
16 | WANG C, NONG Z, GAO C, et al. Enriching query semantics for code search with reinforcement learning[J]. Neural Networks, 2022, 145: 22-32. 10.1016/j.neunet.2021.09.025 |
17 | ZOU Q, ZHANG C. Query expansion via learning change sequences[J]. International Journal of Knowledge-based and Intelligent Engineering Systems, 2020, 24(2): 95-105. 10.3233/kes-200033 |
18 | HU G, PENG M, ZHANG Y, et al. Unsupervised software repositories mining and its application to code search[J]. Software: Practice and Experience, 2020, 50(3): 299-322. 10.1002/spe.2760 |
19 | WU H, YANG Y. Code search based on alteration intent[J]. IEEE Access, 2019, 7: 56796-56802. 10.1109/access.2019.2913560 |
20 | WANG H, ZHANG J, XIA Y, et al. COSEA: convolutional code search with layer-wise attention [EB/OL]. [2022-09-12].. 10.48550/arXiv.2010.09520 |
21 | LING X, WU L, WANG S, et al. Deep graph matching and searching for semantic code retrieval[J]. ACM Transactions on Knowledge Discovery from Data, 2021, 15(5): No.88. 10.1145/3447571 |
22 | WANG W, LI G, MA B, et al. Detecting code clones with graph neural network and flow-augmented abstract syntax tree[C]// Proceedings of the IEEE 27th International Conference on Software Analysis, Evolution and Reengineering. Piscataway: IEEE, 2020: 261-271. 10.1109/saner48275.2020.9054857 |
23 | 夏冰,庞建民,周鑫,等.二进制代码相似性搜索研究进展[J]. 计算机应用, 2022, 42(4):985-998. 10.11772/j.issn.1001-9081.2021071267 |
XIA B, PANG J M, ZHOU X, et al. Research progress on binary code similarity search[J]. Journal of Computer Applications, 2022, 42(4):985-998. 10.11772/j.issn.1001-9081.2021071267 | |
24 | ZHANG J, WANG X, ZHANG H, et al. A novel neural source code representation based on abstract syntax tree [C]// Proceedings of the IEEE/ACM 41st International Conference on Software Engineering. Piscataway: IEEE, 2019: 783-794. 10.1109/icse.2019.00086 |
25 | LING C, LIN Z, ZOU Y, et al. Adaptive deep code search [C]// Proceedings of the 28th International Conference on Program Comprehension. New York: ACM, 2020: 48-59. 10.1145/3387904.3389278 |
26 | MA H, LI Y, JI X, et al. MsCoa: multi-step co-attention model for multi-label classification [J]. IEEE Access, 2019, 7: 109635-109645. 10.1109/access.2019.2933042 |
27 | ZHANG P, ZHU H, XIONG T, et al. Co-attention network and low-rank bilinear pooling for aspect based sentiment analysis [C]// Proceedings of the 2019 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2019: 6725-6729. 10.1109/icassp.2019.8682248 |
28 | SHUAI J, XU L, LIU C, et al. Improving code search with co-attentive representation learning[C]// Proceedings of the 28th International Conference on Program Comprehension. New York: ACM, 2020: 196-207. 10.1145/3387904.3389269 |
29 | SHWARTZ-ZIV R, TISHBY N. Opening the black box of deep neural networks via information [EB/OL]. [2022-09-12].. |
30 | BELGHAZI M I, BARATIN A, RAJESWAR S, et al. Mutual information neural estimation[C]// Proceedings of the 35th International Conference on Machine Learning. New York: JMLR.org, 2018: 531-540. |
[1] | Yuanjiong LIU, Maozheng HE, Yibin HUANG, Cheng QIAN. Ship identification model based on ResNet50 and improved attention mechanism [J]. Journal of Computer Applications, 2024, 44(6): 1935-1941. |
[2] | Yao LIU, Yumeng LI, Miaomiao SONG. Cognitive graph based on business process [J]. Journal of Computer Applications, 2024, 44(6): 1699-1705. |
[3] | Lin GUO, Kunhu LIU, Chenyang MA, Youxue LAI, Yingfen XU. Image super-resolution reconstruction based on residual attention network with receptive field expansion [J]. Journal of Computer Applications, 2024, 44(5): 1579-1587. |
[4] | Boyue WANG, Yingxiang LI, Jiandan ZHONG. Segmentation network for day and night ground-based cloud images based on improved Res-UNet [J]. Journal of Computer Applications, 2024, 44(4): 1310-1316. |
[5] | Shunwang FU, Qian CHEN, Zhi LI, Guomei WANG, Yu LU. Two-channel progressive feature filtering network for tampered image detection and localization [J]. Journal of Computer Applications, 2024, 44(4): 1303-1309. |
[6] | Jingxian ZHOU, Xina LI. UAV detection and recognition based on improved convolutional neural network and radio frequency fingerprint [J]. Journal of Computer Applications, 2024, 44(3): 876-882. |
[7] | Xueyu HUANG, Huaiyu HE, Huimin LIN, Jinshui CHEN. Classification and recognition method of copper alloy metallograph based on feature aggregation [J]. Journal of Computer Applications, 2023, 43(8): 2593-2601. |
[8] | Yuxin TUO, Tao XUE. Joint triple extraction model combining pointer network and relational embedding [J]. Journal of Computer Applications, 2023, 43(7): 2116-2124. |
[9] | Huibin ZHANG, Liping FENG, Yaojun HAO, Yining WANG. Ancient mural dynasty identification based on attention mechanism and transfer learning [J]. Journal of Computer Applications, 2023, 43(6): 1826-1832. |
[10] | Lihua SHEN, Bo LI. Super-resolution reconstruction of lung CT images based on feature pyramid network and dense network [J]. Journal of Computer Applications, 2023, 43(5): 1612-1619. |
[11] | Chengyu LIN, Lei WANG, Cong XUE. Weakly-supervised text classification with label semantic enhancement [J]. Journal of Computer Applications, 2023, 43(2): 335-342. |
[12] | Zhiang ZHANG, Guangzhong LIAO. Multi-scale feature enhanced retinal vessel segmentation algorithm based on U-Net [J]. Journal of Computer Applications, 2023, 43(10): 3275-3281. |
[13] | Yuhang LI, Yuli YANG, Yao MA, Dan YU, Yongle CHEN. Text adversarial example generation method based on BERT model [J]. Journal of Computer Applications, 2023, 43(10): 3093-3098. |
[14] | Liefa LIAO, Zhiming LI, Saisai ZHANG. Image retrieval method based on deep residual network and iterative quantization hashing [J]. Journal of Computer Applications, 2022, 42(9): 2845-2852. |
[15] | Huaiqing HE, Jianqing YAN, Kanghua HUI. Lightweight face recognition method based on deep residual network [J]. Journal of Computer Applications, 2022, 42(7): 2030-2036. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||