Search Result

Journals

Publication Years

Keywords

Please wait a minute...

For Selected:

Download Citations
EndNote Ris BibTeX

Toggle Thumbnails

Select

Data driven parallel incremental support vector machine learning algorithm based on Hadoop framework

PI Wenjun, GONG Xiujun

Journal of Computer Applications 2016, 36 (11): 3044-3049. DOI: 10.11772/j.issn.1001-9081.2016.11.3044

Abstract （626）

PDF （1005KB）（596）

Save

Traditional Support Vector Machine (SVM) algorithm is difficult to deal with the problem of large scale training data, an efficient data driven Parallel Incremental Adaboost-SVM (PIASVM) learning algorithm based on Hadoop was proposed. An ensemble system was used to make each classifier process a partition of the data, and then integrated the classification results to get the combination classifier. Weights were used to depict the spatial distribution prosperities of samples which were to be iteratively reweighted during the incremental training stage, and forgetting factor was applied to select new samples and eliminate historical samples. Also, the controller component based on HBase was used to schedule the iterative procedure, persist the intermediate results and reduce the bandwidth pressure of iterative MapReduce. The experimental results on multiple data sets demonstrate that the proposed algorithm has good performance in speedup, sizeup and scaleup, and high processing capacity of large-scale data while guaranteeing high accuracy.

Reference | Related Articles | Metrics