计算机应用 ›› 2012, Vol. 32 ›› Issue (02): 419-424.DOI: 10.3724/SP.J.1087.2012.00419

• 人工智能 • 上一篇    下一篇

核向量机与支持向量机相结合的二阶段快速学习方法

蒲骏逸,雷秀仁   

  1. 华南理工大学 理学院,广州 510640
  • 收稿日期:2011-07-12 修回日期:2011-09-13 发布日期:2012-02-23 出版日期:2012-02-01
  • 通讯作者: 蒲骏逸
  • 作者简介:蒲骏逸(1986-),男,广东广州人,硕士研究生,主要研究方向:支持向量机、核向量机、并行算法;
    雷秀仁(1964-),男,湖南常宁人,副教授,博士,主要研究方向:矩阵计算、并行算法。

Two-stage fast training method based on core vector machine and support vector machine

PU Jun-yi,LEI Xiu-ren   

  1. School of Sciences, South China University of Technology, Guangzhou Guangdong 510640,China
  • Received:2011-07-12 Revised:2011-09-13 Online:2012-02-23 Published:2012-02-01
  • Contact: PU Jun-yi

摘要: 支持向量机(SVM)作为一种有效的模式分类方法,当数据集规模较大时,学习时间长、泛化能力下降;而核向量机(CVM)分类算法的时间复杂度与样本规模无关,但随着支持向量的增加,CVM的学习时间会快速增长。针对以上问题,提出一种CVM与SVM相结合的二阶段快速学习算法(CCS),首先使用CVM初步训练样本,基于最小包围球(MEB)筛选出潜在核向量,构建新的最有可能影响问题解的训练样本,以此降低样本规模,并使用标记方法快速提取新样本;然后对得到的新训练样本使用SVM进行训练。通过在6个数据集上与SVM和CVM进行比较,实验结果表明,CCS在保持分类精度的同时训练时间平均减少了30%以上,是一种有效的大规模分类学习算法。

关键词: 支持向量机, 分类, 大规模数据集, 核向量机, 最小包围球

Abstract: Support Vector Machine (SVM) is a widely used classification technique. But the scalability of SVM to handle large data sets still needs much of exploration. Core Vector Machine (CVM) is a technique for scaling up a two class SVM to handle large data sets. However, it is computationally infeasible to use CVM to deal with the data set with mass Support Vectors (SV), as its training time is related to the number of SV. In this paper, a two-stage training algorithm combining CVM with SVM (CCS) was proposed. It first employed Minimum Enclosing Ball (MEB) based CVM algorithm to determine the potential core vectors, and then used labeling method to rapidly reconstruct training set, which aim is to reduce the scale of training set. After obtaining new training samples, SVM was adopted to deal with them. The experimental results indicate that the proposed approach can reduce the training time by 30% without losing the classification accuracy, and it is an efficient method for handling large-scale classification.

Key words: Support Vector Machine (SVM), classification, large data set, Core Vector Machine (CVM), Minimum Enclosing Ball (MEB)

中图分类号: