Journal of Computer Applications ›› 2016, Vol. 36 ›› Issue (1): 260-265.DOI: 10.11772/j.issn.1001-9081.2016.01.0260

Previous Articles     Next Articles

Harmfulness prediction of clone code based on Bayesian network

ZHANG Liping, ZHANG Ruixia, WANG Huan, YAN Sheng   

  1. College of Computer and Information Engineering, Inner Mongolia Normal University, Hohhot Nei Mongol 010022, China
  • Received:2015-07-07 Revised:2015-09-22 Online:2016-01-10 Published:2016-01-09
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61363017,61462071), Inner Mongolia Natural Science Foundation of China (2014MS0613).

基于贝叶斯网络的克隆代码有害性预测

张丽萍, 张瑞霞, 王欢, 闫盛   

  1. 内蒙古师范大学 计算机与信息工程学院, 呼和浩特 010022
  • 通讯作者: 张丽萍(1974-),女,内蒙古呼和浩特人,教授,CCF会员,主要研究方向:软件工程、软件分析
  • 作者简介:张瑞霞(1989-),女,内蒙古乌兰察布人,硕士研究生,主要研究方向:软件分析;王欢(1991-),男,内蒙古巴彦淖尔人,硕士研究生,主要研究方向:代码分析;闫盛(1984-),男,内蒙古包头人,讲师,主要研究方向:软件分析、并行计算。
  • 基金资助:
    国家自然科学基金资助项目(61363017,61462071);内蒙古自然科学基金资助项目(2014MS0613)。

Abstract: During the process of software development, activities of programmers including copy and paste result in a lot of code clones. However, the inconsistent code changes are always harmful to the programs. To solve this problem, and find harmful code clones in programs effectively, a method was proposed to predict harmful code clones by using Bayesian network. First, referring to correlation research on software defects prediction and clone evolution, two software metrics including static metrics and evolution metrics were proposed to characterize the features of clone codes. Then the prediction model was constructed by using core algorithm of Bayesian network. Finally, the probability of harmful code clones occurrence was predicted. Five different types of open-source software system containing 99 versions written in C languages were tested to evaluate the prediction model. The experimental results show that the proposed method can predict harmfulness for clones with better applicability and higher accuracy, and further reduce the threat of harmful code clones while improving software quality.

Key words: clone code, harmfulness prediction, Bayesian network, clone evolution, machine learning

摘要: 在软件开发过程中,程序员的复制、粘贴活动会产生大量的克隆代码,而那些发生不一致变化的克隆代码往往对程序是有害的。为了解决该问题,有效地发现程序中的有害克隆代码,提出一种基于贝叶斯网络的克隆有害性预测方法。首先,结合软件缺陷研究领域与克隆演化领域的相关研究成果,提出了两大类表征克隆代码信息的特征,分别是静态特征和演化特征;其次,通过贝叶斯网络核心算法来构建克隆有害性预测模型;最后,预测有害克隆代码发生的可能性。在5款C语言开源软件共99个版本上对克隆有害性预测模型的性能进行评估,实验结果表明该方法能够有效地实现对克隆代码有害性的预测,降低有害克隆代码对软件的威胁,提高软件质量。

关键词: 克隆代码, 有害性预测, 贝叶斯网络, 克隆演化, 机器学习

CLC Number: