Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Learning sample extraction method based on convex boundary
GU Yiyi, TAN Xuntao, YUAN Yubo
Journal of Computer Applications    2019, 39 (8): 2281-2287.   DOI: 10.11772/j.issn.1001-9081.2019010162
Abstract570)      PDF (1258KB)(427)       Save
The quality and quantity of learning samples are very important for intelligent data classification systems. But there is no general good method for finding meaningful samples in data classification systems. For this reason, the concept of convex boundary of dataset was proposed, and a fast method of discovering meaningful sample set was given. Firstly, abnormal and incomplete samples in the learning sample set were cleaned by box-plot function. Secondly, the concept of data cone was proposed to divide the normalized learning samples into cones. Finally, each cone of sample subset was centralized, and based on convex boundary, samples with very small difference from convex boundary were extracted to form convex boundary sample set. In the experiments, 6 classical data classification algorithms, including Gaussian Naive Bayes (GNB), Classification And Regression Tree (CART), Linear Discriminant Analysis (LDA), Adaptive Boosting (AdaBoost), Random Forest (RF) and Logistic Regression (LR), were tested on 12 UCI datasets. The results show that convex boundary sample sets can significantly shorten the training time of each algorithm while maintaining the classification performance. In particular, for datasets with many noise data such as caesarian section, electrical grid, car evaluation datasets, convex boundary sample set can improve the classification performance. In order to better evaluate the efficiency of convex boundary sample set, the sample cleaning efficiency was defined as the quotient of sample size change rate and classification performance change rate. With this index, the significance of convex boundary samples was evaluated objectively. Cleaning efficiency greater than 1 proves that the method is effective. The higher the numerical value, the better the effect of using convex boundary samples as learning samples. For example, on the dataset of HTRU2, the cleaning efficiency of the proposed method for GNB algorithm is over 68, which proves the strong performance of this method.
Reference | Related Articles | Metrics
Impact of regression algorithms on performance of defect number prediction model
FU Zhongwang, XIAO Rong, YU Xiao, GU Yi
Journal of Computer Applications    2018, 38 (3): 824-828.   DOI: 10.11772/j.issn.1001-9081.2017081935
Abstract829)      PDF (932KB)(499)       Save
Focusing on the issue that the existing studies do not consider the imbalanced data distribution problem in defect datasets and employ improper performance measures to evaluate the performance of regression models for predicting the number of defects, the impact of different regression algorithms on models for predicting the number of defects were explored by using Fault-Percentile-Average (FPA) as the performance measure. Experiments were conducted on six datasets from PROMISE repository to analyze the impact on the models and the difference of ten regression algorithms for predicting the number of defects. The results show that the forecast results of models for predicting the number of defects built by different regression algorithms are various, and gradient boosting regression algorithm and Bayesian ridge regression algorithm can achieve better performance as a whole.
Reference | Related Articles | Metrics
Network traffic classification based on Plane-Gaussian artificial neural network
YANG Xubing, FENG Zhe, GU Yifan, XUE Hui
Journal of Computer Applications    2017, 37 (3): 782-785.   DOI: 10.11772/j.issn.1001-9081.2017.03.782
Abstract573)      PDF (792KB)(482)       Save
Aiming at the problems of network flow monitoring (classification) in complex network environment, a stochastic artificial neural network learning method was proposed to realize the direct classification of multiple classes and improve the training speed of learning methods. Using Plane-Gaussian (PG) artificial neural network model, the idea of stochastic projection was introduced, and the network connection matrix was obtained by calculating the pseudo-inverse analysis. Theoretically, it can be proved that the network has global approximation ability. The artificial simulation was carried out on artificial data and standard network flow monitoring data. Compared with the Extreme Learning Machine (ELM) and PG network using the random method, the analysis and experimental results show that: 1)the proposed method inherits the geometric characteristics of the PG network and is more effective for the planar distributed data; 2)it has comparable training speed to ELM, but significantly faster than PG network; 3)among the three methods, the proposed method is more suitable for solving the problem of network flow monitoring.
Reference | Related Articles | Metrics
Accurate search method for source code by combining syntactic and semantic queries
GU Yisheng, ZENG Guosun
Journal of Computer Applications    2017, 37 (10): 2958-2963.   DOI: 10.11772/j.issn.1001-9081.2017.10.2958
Abstract652)      PDF (985KB)(608)       Save
In the process of programming and source code reuse, since simple keyword-based code search often leads to inaccurate results, an accurate search method for source code was proposed. Firstly, according to the objectivity and uniqueness of syntax and semantics, the syntactic structure and semantics of I/O of a function in source code were considered as part of a query. Such query should be submitted following a regularized format. Secondly, the syntactic structure, semantics of I/O, keyword-compatible match algorithms along with the reliability calculation algorithm were designed. Finally, the accurate search method by combining syntactic and semantic queries was realized by using the above algorithms. The test result shows that the proposed method can improve Mean Reciprocal Rank (MRR) by more than 62% compared with the common keyword-based search method, and it is effective in improving the accuracy of source code search.
Reference | Related Articles | Metrics