计算机应用 ›› 2017, Vol. 37 ›› Issue (10): 2958-2963.DOI: 10.11772/j.issn.1001-9081.2017.10.2958

• 计算机软件技术 • 上一篇    下一篇

基于语法和语义结合的源代码精确搜索方法

顾逸圣1,2, 曾国荪1   

  1. 1. 同济大学 计算机科学及技术系, 上海 200092;
    2. 嵌入式系统与服务计算教育部重点实验室, 上海 200092
  • 收稿日期:2017-04-21 修回日期:2017-06-09 出版日期:2017-10-10 发布日期:2017-10-16
  • 通讯作者: 顾逸圣(1992-),男,上海人,硕士研究生,主要研究方向:软件工程、代码搜索,E-mail:qswy929@163.com
  • 作者简介:顾逸圣(1992-),男,上海人,硕士研究生,主要研究方向:软件工程、代码搜索;曾国荪(1964-),男,江西吉安人,教授,博士,博士生导师,主要研究方向:并行计算、可信软件、信息安全.
  • 基金资助:
    上海市优秀学科带头人计划项目(10XD1404400);同济大学实验教改项目(0800104214)。

Accurate search method for source code by combining syntactic and semantic queries

GU Yisheng1,2, ZENG Guosun1   

  1. 1. Department of Computer Science and Technology, Tongji University, Shanghai 200092, China;
    2. Embedded System and Service Computing Key Laboratory of Ministry of Education, Shanghai 200092, China
  • Received:2017-04-21 Revised:2017-06-09 Online:2017-10-10 Published:2017-10-16
  • Supported by:
    This work is partially supported by the Program of Shanghai Subject Chief Scientist (10XD1404400), the Experimental Teaching Reform Project of Tongji University (0800104214).

摘要: 针对在编写软件、复用源代码的过程中仅依靠关键词无法精准搜索到适用源代码的问题,提出一种将语法和语义结合的源代码精准搜索方法。首先依据源代码语法语义的客观和唯一性,增加语法结构和"输入/输出"语义作为用户录入请求的一部分,并规范了具体的请求格式;然后在此基础上分别设计源代码语法匹配算法、"输入/输出"语义匹配算法、关键词兼容匹配,以及源代码搜索结果可信度计算算法;最后综合上述算法实现对源代码的精准搜索。测试结果表明:与单纯的关键词搜索相比,提出的方法对搜索的平均排序倒数(MRR)有超过62%的提升,有助于实现源代码的精准搜索。

关键词: 软件编写, 源代码复用, 语法语义, 匹配搜索

Abstract: In the process of programming and source code reuse, since simple keyword-based code search often leads to inaccurate results, an accurate search method for source code was proposed. Firstly, according to the objectivity and uniqueness of syntax and semantics, the syntactic structure and semantics of I/O of a function in source code were considered as part of a query. Such query should be submitted following a regularized format. Secondly, the syntactic structure, semantics of I/O, keyword-compatible match algorithms along with the reliability calculation algorithm were designed. Finally, the accurate search method by combining syntactic and semantic queries was realized by using the above algorithms. The test result shows that the proposed method can improve Mean Reciprocal Rank (MRR) by more than 62% compared with the common keyword-based search method, and it is effective in improving the accuracy of source code search.

Key words: software programming, source code reuse, syntax and semantics, matching search

中图分类号: