Search Result

Select

Clone code detection based on image similarity

WANG Yafang, LIU Dongsheng, HOU Min

Journal of Computer Applications 2019, 39 (7): 2074-2080. DOI: 10.11772/j.issn.1001-9081.2019010083

Abstract （561）

PDF （1041KB）（311）

Save

At present, scholars mainly focus on four perspectives of text, vocabulary, grammar and semantics in the field of clone code detection. However, few breakthroughs have been made in the effect of clone code detection for a long time. In view of this problem, a new method called Clone Code detection based on Image Similarity (CCIS) was proposed. Firstly, the source code was preprocessed by removing comments, white space, etc., from which a "clean" function fragment was able to be obtained, and the identifiers, keywords, etc. in the function were highlighted. Then the processed source code was converted into images and these images were normalized. Finally, Jaccard distance and perceptual Hash algorithm were used for detection, obtaining the clone code information from these images. In order to verify the validity of this method, six open source softwares were used to constitute the evaluation dataset for testing. The experimental results show that CCIS method can detect 100% type-1 clone code, 88% type-2 clone code and 60% type-3 clone code, which proves the good effect of CCIS method on clone code detection.

Reference | Related Articles | Metrics

Select

Feature selection model for harmfulness prediction of clone code

WANG Huan, ZHANG Liping, YAN Sheng, LIU Dongsheng

Journal of Computer Applications 2017, 37 (4): 1135-1142. DOI: 10.11772/j.issn.1001-9081.2017.04.1135

Abstract （413）

PDF （1468KB）（418）

Save

To solve the problem of irrelevant and redundant features in harmfulness prediction of clone code, a combination model for harmfulness feature selection of code clone was proposed based on relevance and influence. Firstly, a preliminary sorting for the correlation of feature data was proceeded by the information gain ratio, then the features with high correlation was preserved and other irrelevant features were removed to reduce the search space of features. Next, the optimal feature subset was determined by using the wrapper sequential floating forward selection algorithm combined with six kinds of classifiers including Naive Bayes and so on. Finally, the different feature selection methods were analyzed, and feature data was analyzed, filtered and optimized by using the advantages of various methods in different selection critera. Experimental results show that the prediction accuracy is increased by15.2-34 percentage pointsafter feature selection; and compared with other feature selection methods, F1-measure of this method is increased by 1.1-10.1 percentage points, and AUC measure is increased by 0.7-22.1 percentage points. As a result, this method can greatly improve the accuracy of harmfulness prediction model.

Reference | Related Articles | Metrics

Select

Evolution pattern recognition and genealogy construction based on clone mapping of versions

ZHANG Jiujie, ZHAI Ye, WANG Chunhui, ZHANG Liping, LIU Dongsheng

Journal of Computer Applications 2016, 36 (7): 2021-2030. DOI: 10.11772/j.issn.1001-9081.2016.07.2021

Abstract （458）

PDF （1721KB）（354）

Save

To solve the problems that the method of building clone genealogy is complicated, as well as evolution patterns need urgently expanding, new clone evolution patterns were proposed, and clone genealogy was built automatically based on the mapping relationships of code clones between versions. First, topics of code clones were extracted using Latent Dirichlet Allocation (LDA) from clone detection results in each released software version. Second, mapping relationships of code clones between of versions were confirmed by similarities of the topics. Third, evolution patterns were appended to code clones according to the existing mapping relationships, and evolution features were analyzed. Finally, clone genealogy was built by integrating mapping relationships and evolution patterns together. Experiments of building clone genealogy was conducted on four open source systems. The experimental results show that the proposed approach is feasible, and the proposed evolution patterns really exist in the procedure of software evolution. Further more, it is found that about 90% of code clones in the software systems are stable during evolution, and approximately 67% of clone groups live through less than half of the release versions. The experimental conclusions and relevant analysis provide strongly support for the future research as well as maintenance and management of code clones.

Reference | Related Articles | Metrics

Select

Clone genealogies extraction based on software evolution over multiple versions

TU Ying, ZHANG Liping, WANG Chunhui, HOU Min, LIU Dongsheng

Journal of Computer Applications 2015, 35 (4): 1169-1173. DOI: 10.11772/j.issn.1001-9081.2015.04.1169

Abstract （996）

PDF （985KB）（630）

Save

Since clone detection results cannot fully reflect the features of clones, clone genealogies extraction from multiple versions can be used to uncover the patterns and characteristics exhibited by clones in the evolving system. A clone genealogy extraction method named FCG was proposed. FCG first mapped clones between each adjacent versions and then identified clone evolution patterns. All of the results were combined to get clone genealogies. Experiments on 6 open source systems found that the average lifetime of clones in current version is over 70 percent of the total number of studied versions, and most of them do not change, which indicates that majority of clones can be well maintained. While some unstable clones may be defect potential, and needs to be modified or refactoring. Results show that FCG can efficiently extract clone genealogies, which contributes to a better understanding of clones and provides insights on targeted management of clones.

Reference | Related Articles | Metrics

Select

Clone code detection based on Levenshtein distance of token

ZHANG Jiujie, WANG Chunhui, ZHANG Liping, HOU Min, LIU Dongsheng

Journal of Computer Applications 2015, 35 (12): 3536-3543. DOI: 10.11772/j.issn.1001-9081.2015.12.3536

Abstract （1286）

PDF （1361KB）（469）

Save

Aiming at the problems of less clone code detection tools and low efficiency for the current Type-3, an effective clone code detection method for Type-3 based on the levenshtein distance of token was proposed. Type-1, Type-2 and Type-3 clone codes could be detected by the proposed method in an efficient way. Firstly, the source codes of a subject system were tokenized into some token sequences with specified code size. Secondly, each definite-sized substring of the token sequences was mapped with corresponding index. Thirdly, the clone pairs were built by the levenshtein distance algorithm and the clone groups were built by the disjoint-set algorithm on the basis of the mapping information query. Finally, the feedback information of clone codes were given. A prototype tool named FClones was implemented. It was evaluated by the code mutation-based framework and compared with two state-of-the-art tools SimCad and NiCad. The experimental results show that the recall of FCloens is equal to or greater than 95% and its precision is not lower than 98% in detecting all of these three types of clone codes. FClones can do better in detecting Type-3 clones than others.

Reference | Related Articles | Metrics

Select

Predicting inconsistent change probability of code clone based on latent Dirichlet allocation model

YI Lili ZHANG Liping WANG Chunhui TU Ying LIU Dongsheng

Journal of Computer Applications 2014, 34 (6): 1788-1791. DOI: 10.11772/j.issn.1001-9081.2014.06.1788

Abstract （171）

PDF （748KB）（409）

Save

The activities of the programmers including copy, paste and modify result in a lot of code clone in the software systems. However, the inconsistent change of code clone is the main reason that causes program error and increases maintenance costs in the evolutionary process of the software version. To solve this problem, a new research method was proposed. The mapping relationship between the clone groups was built at first. Then the theme of lineal cloning cluster was extracted using Latent Dirichlet Allocation (LDA) model. Finally, the inconsistent change probability of code clone was predicted. A software which contains eight versions was tested and an obvious discrimination was got. The experimental results show that the method can effectively predict the probability of inconsistent change and be used for evaluating quality and credibility of software.

Reference | Related Articles | Metrics