计算机应用 ›› 2016, Vol. 36 ›› Issue (7): 2031-2037.DOI: 10.11772/j.issn.1001-9081.2016.07.2031

• 计算机软件技术 • 上一篇    下一篇

基于改进向量空间模型的克隆群映射方法

陈桌, 张丽萍, 王欢, 张久杰, 王春晖   

  1. 内蒙古师范大学 计算机与信息工程学院, 呼和浩特 010022
  • 收稿日期:2015-12-28 修回日期:2016-03-11 出版日期:2016-07-10 发布日期:2016-07-14
  • 通讯作者: 张丽萍
  • 作者简介:陈桌(1989-),男,山东菏泽人,硕士研究生,主要研究方向:软件工程、软件分析;张丽萍(1974-),女,内蒙古呼和浩特人,教授,CCF会员,硕士,主要研究方向:软件工程、软件分析;王欢(1991-),男,内蒙古巴彦淖尔人,硕士研究生,主要研究方向:软件工程、软件分析;张久杰(1990-),男,内蒙古赤峰人,硕士研究生,主要研究方向:软件分析;王春晖(1979-),女(蒙古族),内蒙古通辽人,讲师,硕士,CCF会员,主要研究方向:软件分析、多媒体、计算机辅助教学。
  • 基金资助:
    国家自然科学基金资助项目(61363017,61462071);内蒙古自然科学基金资助项目(2014MS0613);内蒙古教育厅资助项目(NJZY14039)。

Clone group mapping method based on improved vector space model

CHEN Zhuo, ZHANG Liping, WANG Huan, ZHANG Jiujie, WANG Chunhui   

  1. College of Computer and Information Engineering, Inner Mongolia Normal University, Hohhot Nei Mongol 010022, China
  • Received:2015-12-28 Revised:2016-03-11 Online:2016-07-10 Published:2016-07-14
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61363017, 61462071), the Natural Science Foundation of Inner Mongolia, China (2014MS0613), the Foundation Projects of Inner Mongolia Education Department (NJZY14039).

摘要: 针对Type-3克隆代码映射方法少且效率低等问题,提出了一种基于改进向量空间模型(VSM)的映射方法。该方法将改进的VSM引入到克隆代码分析中,从而得到一种可有效映射Type-1、Type-2以及Type-3克隆代码的克隆群映射方法。首先,将克隆群文档预处理得到去除无用词的代码文档,同时提取克隆群文档的文件名、函数名等特征项;其次,提取并构建克隆群词频向量空间,利用余弦算法计算出克隆群相似度;然后,通过克隆群相似度和特征项的匹配构建克隆群映射,最终得到克隆群映射结果。对5款开源软件进行实验并人工验证,所提方法能在低时耗的前提下,保证查全率和查准率均不低于96.1%和97.1%。实验结果表明了所提方法的可行性,为后期软件演化分析提供数据支撑。

关键词: 克隆代码, 克隆群映射, 向量空间模型, 特征项, 词频

Abstract: Focusing on the less quantity and low efficiency problem of Type-3 clone code mapping method, a mapping method based on improved Vector Space Model (VSM) was proposed. Improved VSM was introduced into the clone code analysis to get an effective clone group mapping method for Type-1, Type-2 and Type-3. Firstly, clone group document was pretreated to get the code document with removing useless word, and the file name, function name and other features of clone group document were extracted at the same time. Secondly, word frequency vector space of clone group was extracted and built; the similarity of clone group was calculated by using cosine algorithm. Then mapping of clone group was constructed by clone group similarity and feature matching, and the result of cloning group mapping was obtained finally. Five pieces of open source software was tested and verified by experiments. The proposed method can guarantee the recall and the precision of not less than 96.1% and 97.1% at low time consumption. The experimental results show that the proposed method is feasible, which provides data support for the analysis of software evolution.

Key words: code clone, clone group mapping, Vector Space Model (VSM), feature item, word frequency

中图分类号: