Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Data driven parallel incremental support vector machine learning algorithm based on Hadoop framework
PI Wenjun, GONG Xiujun
Journal of Computer Applications    2016, 36 (11): 3044-3049.   DOI: 10.11772/j.issn.1001-9081.2016.11.3044
Abstract626)      PDF (1005KB)(596)       Save
Traditional Support Vector Machine (SVM) algorithm is difficult to deal with the problem of large scale training data, an efficient data driven Parallel Incremental Adaboost-SVM (PIASVM) learning algorithm based on Hadoop was proposed. An ensemble system was used to make each classifier process a partition of the data, and then integrated the classification results to get the combination classifier. Weights were used to depict the spatial distribution prosperities of samples which were to be iteratively reweighted during the incremental training stage, and forgetting factor was applied to select new samples and eliminate historical samples. Also, the controller component based on HBase was used to schedule the iterative procedure, persist the intermediate results and reduce the bandwidth pressure of iterative MapReduce. The experimental results on multiple data sets demonstrate that the proposed algorithm has good performance in speedup, sizeup and scaleup, and high processing capacity of large-scale data while guaranteeing high accuracy.
Reference | Related Articles | Metrics