计算机应用 ›› 2018, Vol. 38 ›› Issue (3): 812-817.DOI: 10.11772/j.issn.1001-9081.2017082043

• 计算机软件技术 • 上一篇    下一篇

基于多特征权重分配的源代码搜索优化

李阵, 钮俊, 王奎, 辛园园   

  1. 宁波大学 信息科学与工程学院 浙江 宁波 315211
  • 收稿日期:2017-08-31 修回日期:2017-11-05 出版日期:2018-03-10 发布日期:2018-03-07
  • 通讯作者: 李阵
  • 作者简介:李阵(1992-),男,安徽阜阳人,硕士研究生,主要研究方向:开源搜索;钮俊(1976-),男,四川阆中人,副教授,博士,主要研究方向:服务计算、开源搜索;王奎(1988-),男,江西九江人,硕士研究生,主要研究方向:开源搜索;辛园园(1993-),女,河南信阳人,硕士研究生,主要研究方向:开源搜索。

Optimization of source code search based on multi-feature weight assignment

LI Zhen, NIU Jun, WANG Kui, XIN Yuanyuan   

  1. College of Information Science and Engineering, Ningbo University, Ningbo Zhejiang 315211, China
  • Received:2017-08-31 Revised:2017-11-05 Online:2018-03-10 Published:2018-03-07

摘要: 对开源代码进行准确搜索是实现代码复用的前提。在基于关键字搜索的研究中,目前只关注匹配方法签名。结合源代码注释对方法功能的语义描述,提出结合代码注释的关键字搜索方法。通过生成源代码抽象语法树,从中识别方法签名与各类型注释等组合代码特征;将代码特征与查询语句分别用向量表示,并计算向量间的余弦相似度,然后制定针对搜索结果多特征权重分配的评分机制。根据评分对搜索结果进行排序,得到与查询语句相关的结果序列。实验结果表明,多个代码特征在不同权重影响下可以提升源代码搜索准确度。

关键词: 代码复用, 代码注释, 方法签名, 抽象语法树, 代码特征

Abstract: It is a precondition of achieving code reuse to search open source code accurately. The current methods based on keyword search only concern matching function signatures. Considering the source code comments on the semantic description of the method's function, a method based on keyword search was proposed, which took into account code comments. The features of code, such as function signatures and different types of comments, were identified from the generated abstract syntax tree of source code; the code features and query statements were transformed into vectors respectively, and then based on the cosine similarity between the vectors, the scoring mechanism of multi-feature weight assignment to the results was created. According to the scores, an ordered list of relevant functions was obtained that reflects the associations between code features in the functions and a query. The experimental results demonstrate that the accuracy of search results can be improved by using multiple code features with different weights.

Key words: code reuse, code comments, method signature, abstract syntax tree, code feature

中图分类号: