Search Result

Select

Learning sample extraction method based on convex boundary

GU Yiyi, TAN Xuntao, YUAN Yubo

Journal of Computer Applications 2019, 39 (8): 2281-2287. DOI: 10.11772/j.issn.1001-9081.2019010162

Abstract （570）

PDF （1258KB）（427）

Save

The quality and quantity of learning samples are very important for intelligent data classification systems. But there is no general good method for finding meaningful samples in data classification systems. For this reason, the concept of convex boundary of dataset was proposed, and a fast method of discovering meaningful sample set was given. Firstly, abnormal and incomplete samples in the learning sample set were cleaned by box-plot function. Secondly, the concept of data cone was proposed to divide the normalized learning samples into cones. Finally, each cone of sample subset was centralized, and based on convex boundary, samples with very small difference from convex boundary were extracted to form convex boundary sample set. In the experiments, 6 classical data classification algorithms, including Gaussian Naive Bayes (GNB), Classification And Regression Tree (CART), Linear Discriminant Analysis (LDA), Adaptive Boosting (AdaBoost), Random Forest (RF) and Logistic Regression (LR), were tested on 12 UCI datasets. The results show that convex boundary sample sets can significantly shorten the training time of each algorithm while maintaining the classification performance. In particular, for datasets with many noise data such as caesarian section, electrical grid, car evaluation datasets, convex boundary sample set can improve the classification performance. In order to better evaluate the efficiency of convex boundary sample set, the sample cleaning efficiency was defined as the quotient of sample size change rate and classification performance change rate. With this index, the significance of convex boundary samples was evaluated objectively. Cleaning efficiency greater than 1 proves that the method is effective. The higher the numerical value, the better the effect of using convex boundary samples as learning samples. For example, on the dataset of HTRU2, the cleaning efficiency of the proposed method for GNB algorithm is over 68, which proves the strong performance of this method.

Reference | Related Articles | Metrics

Select

Impact of regression algorithms on performance of defect number prediction model

FU Zhongwang, XIAO Rong, YU Xiao, GU Yi

Journal of Computer Applications 2018, 38 (3): 824-828. DOI: 10.11772/j.issn.1001-9081.2017081935

Abstract （829）

PDF （932KB）（499）

Save

Focusing on the issue that the existing studies do not consider the imbalanced data distribution problem in defect datasets and employ improper performance measures to evaluate the performance of regression models for predicting the number of defects, the impact of different regression algorithms on models for predicting the number of defects were explored by using Fault-Percentile-Average (FPA) as the performance measure. Experiments were conducted on six datasets from PROMISE repository to analyze the impact on the models and the difference of ten regression algorithms for predicting the number of defects. The results show that the forecast results of models for predicting the number of defects built by different regression algorithms are various, and gradient boosting regression algorithm and Bayesian ridge regression algorithm can achieve better performance as a whole.

Reference | Related Articles | Metrics

Select

Network traffic classification based on Plane-Gaussian artificial neural network

YANG Xubing, FENG Zhe, GU Yifan, XUE Hui

Journal of Computer Applications 2017, 37 (3): 782-785. DOI: 10.11772/j.issn.1001-9081.2017.03.782

Abstract （573）

PDF （792KB）（482）

Save

Aiming at the problems of network flow monitoring (classification) in complex network environment, a stochastic artificial neural network learning method was proposed to realize the direct classification of multiple classes and improve the training speed of learning methods. Using Plane-Gaussian (PG) artificial neural network model, the idea of stochastic projection was introduced, and the network connection matrix was obtained by calculating the pseudo-inverse analysis. Theoretically, it can be proved that the network has global approximation ability. The artificial simulation was carried out on artificial data and standard network flow monitoring data. Compared with the Extreme Learning Machine (ELM) and PG network using the random method, the analysis and experimental results show that: 1)the proposed method inherits the geometric characteristics of the PG network and is more effective for the planar distributed data; 2)it has comparable training speed to ELM, but significantly faster than PG network; 3)among the three methods, the proposed method is more suitable for solving the problem of network flow monitoring.

Reference | Related Articles | Metrics

Select

Accurate search method for source code by combining syntactic and semantic queries

GU Yisheng, ZENG Guosun

Journal of Computer Applications 2017, 37 (10): 2958-2963. DOI: 10.11772/j.issn.1001-9081.2017.10.2958

Abstract （652）

PDF （985KB）（608）

Save

In the process of programming and source code reuse, since simple keyword-based code search often leads to inaccurate results, an accurate search method for source code was proposed. Firstly, according to the objectivity and uniqueness of syntax and semantics, the syntactic structure and semantics of I/O of a function in source code were considered as part of a query. Such query should be submitted following a regularized format. Secondly, the syntactic structure, semantics of I/O, keyword-compatible match algorithms along with the reliability calculation algorithm were designed. Finally, the accurate search method by combining syntactic and semantic queries was realized by using the above algorithms. The test result shows that the proposed method can improve Mean Reciprocal Rank (MRR) by more than 62% compared with the common keyword-based search method, and it is effective in improving the accuracy of source code search.

Reference | Related Articles | Metrics