计算机应用 ›› 2009, Vol. 29 ›› Issue (10): 2736-2740.

• 人工智能 • 上一篇    下一篇

一种新的支持向量机大规模训练样本集缩减策略

朱方1,顾军华2,杨欣伟1,杨瑞霞3   

  1. 1. 河北工业大学
    2. 河北工业大学 计算机科学与软件学院
    3. 河北工业大学 信息工程学院
  • 收稿日期:2009-04-16 修回日期:2009-06-01 出版日期:2009-10-01 发布日期:2009-10-28
  • 通讯作者: 杨欣伟
  • 基金资助:
    天津市自然科学基金

New reduction strategy of large-scale training sample set for SVM

  • Received:2009-04-16 Revised:2009-06-01 Online:2009-10-01 Published:2009-10-28

摘要: 支持向量机(SVM)在许多实际应用中由于训练样本集规模较大且具有类内混杂孤立点数据,引发了学习速度慢、存储需求量大、泛化能力降低等问题,成为直接使用该技术的瓶颈。针对这些问题,通过在点集理论的基础上分析训练样本集的结构,提出了一种新的支持向量机大规模训练样本集缩减策略。该策略运用模糊聚类方法快速的提取出潜在支持向量并去除类内非边界孤立点,在减小训练样本集规模的同时,能够有效地避免孤立点数据所造成的过学习现象,提高了SVM的泛化性能,在保证不降低分类精度的前提下提高训练速度。

关键词: 支持向量机, 点集, 模糊C-均值, 潜在支持向量, 孤立点

Abstract: It has become a bottleneck to use Support Vector Machine (SVM) due to such problems as slow learning speed, large buffer memory requirement, low generalization performance and so on, which are caused by large-scale training sample set and outlier data immixed in the other class. Concerning these problems, this paper proposed a new reduction strategy for large-scale training sample set according to the analysis on the structure of the training sample set based on the point set theory. This new strategy gets the potential support vectors and removes the non-boundary outlier data immixed in the other class by using fuzzy clustering. That can greatly reduce the scale of the training sample set and improve the generalization performance by effectively avoiding over-learning caused by outlier data, and finally speed up learning rate without reducing the classification accuracy.

Key words: Support Vector Machine (SVM), point set, Fuzzy C-Means (FCM), potential support vector, outlier