计算机应用 ›› 2016, Vol. 36 ›› Issue (7): 2021-2030.DOI: 10.11772/j.issn.1001-9081.2016.07.2021

• 计算机软件技术 • 上一篇    下一篇

基于版本间克隆映射的演化模式识别及谱系构建

张久杰, 翟晔, 王春晖, 张丽萍, 刘东升   

  1. 内蒙古师范大学 计算机与信息工程学院, 呼和浩特 010022
  • 收稿日期:2016-01-08 修回日期:2016-03-14 出版日期:2016-07-10 发布日期:2016-07-14
  • 通讯作者: 刘东升
  • 作者简介:张久杰(1990-),男,内蒙古赤峰人,硕士研究生,主要研究方向:软件分析;翟晔(1976-),女,内蒙古呼和浩特人,讲师,硕士,主要研究方向:软件分析;王春晖(1979-),女(蒙古族),内蒙古通辽人,讲师,硕士,CCF会员,主要研究方向:软件分析、多媒体、计算机辅助教学;张丽萍(1974-),女,内蒙古呼和浩特人,教授,硕士,CCF会员,主要研究方向:软件工程、软件分析;刘东升(1956-),男,内蒙古呼和浩特人,教授,CCF会员,主要研究方向:软件工程、计算机教育。
  • 基金资助:
    国家自然科学基金资助项目(61363017,61462071);内蒙古自然科学基金资助项目(2014MS0613,2015MS0606);内蒙古自治区高等学校科学研究项目(NJZY14039,NJZY16046)。

Evolution pattern recognition and genealogy construction based on clone mapping of versions

ZHANG Jiujie, ZHAI Ye, WANG Chunhui, ZHANG Liping, LIU Dongsheng   

  1. College of Computer and Information Engineering, Inner Mongolia Normal University, Hohhot Nei Mongol 010022, China
  • Received:2016-01-08 Revised:2016-03-14 Online:2016-07-10 Published:2016-07-14
  • Supported by:
    This work is supported by the National Natural Science Foundation of China (61363017, 61462071), the Natural Science Foundation of Inner Mongolia, China (2014MS0613, 2015MS0606), the Foundation for Science Research Program in Higher Education of Inner Mongolia, China (NJZY14039, NJZY16046).

摘要: 针对当前克隆谱系的构建方法较为复杂、演化模式亟需扩充等问题,提出了新的克隆代码演化模式,并根据软件版本间的克隆代码映射关系自动构建了克隆谱系。首先,针对软件每一版本进行克隆检测并利用潜在狄利克雷分配(LDA)抽取克隆代码的主题信息;然后,根据克隆代码主题的相似度确定版本间克隆代码的映射关系;进而,根据已有的映射关系为克隆代码添加演化模式并分析演化特征;最终,结合映射信息与演化模式信息完成克隆谱系的构建。针对4款开源软件进行了克隆谱系的构建实验,实验结果表明所提克隆谱系构建方法可行,证实了新提出的演化模式在克隆代码演化过程中确实存在。实验发现约90%的克隆代码在软件演化过程中比较稳定,约67%的克隆群经历的发布版本数不超过发布版本总数的一半。实验结论及理论分析将为克隆代码的后续研究及克隆代码的维护与管理提供有力支持。

关键词: 克隆代码, 主题建模, 软件演化, 演化模式, 克隆谱系, 软件维护

Abstract: To solve the problems that the method of building clone genealogy is complicated, as well as evolution patterns need urgently expanding, new clone evolution patterns were proposed, and clone genealogy was built automatically based on the mapping relationships of code clones between versions. First, topics of code clones were extracted using Latent Dirichlet Allocation (LDA) from clone detection results in each released software version. Second, mapping relationships of code clones between of versions were confirmed by similarities of the topics. Third, evolution patterns were appended to code clones according to the existing mapping relationships, and evolution features were analyzed. Finally, clone genealogy was built by integrating mapping relationships and evolution patterns together. Experiments of building clone genealogy was conducted on four open source systems. The experimental results show that the proposed approach is feasible, and the proposed evolution patterns really exist in the procedure of software evolution. Further more, it is found that about 90% of code clones in the software systems are stable during evolution, and approximately 67% of clone groups live through less than half of the release versions. The experimental conclusions and relevant analysis provide strongly support for the future research as well as maintenance and management of code clones.

Key words: code clone, topic modeling, software evolution, evolving pattern, clone genealogy, software maintenance

中图分类号: