Journal of Computer Applications ›› 2018, Vol. 38 ›› Issue (4): 945-948.DOI: 10.11772/j.issn.1001-9081.2017092228

Previous Articles     Next Articles

Local focus support vector machine algorithm

ZHOU Yuhao1, ZHANG Hongling1, LI Fangfei2, QI Peng1   

  1. 1. College of Petroleum Engineering, China University of Petroleum, Beijing 102249, China;
    2. School of Media and Communication, Wuhan Textile University, Wuhan Hubei 430000, China
  • Received:2017-09-12 Revised:2017-12-18 Online:2018-04-10 Published:2018-04-09

局部关注支持向量机算法

周于皓1, 张红玲1, 李芳菲2, 祁鹏1   

  1. 1. 中国石油大学(北京) 石油工程学院, 北京 102249;
    2. 武汉纺织大学 传媒学院, 武汉 430000
  • 通讯作者: 周于皓
  • 作者简介:周于皓(1992-),男,山东东营人,硕士研究生,主要研究方向:油气田开发大数据、油气田人工智能;张红玲(1966-),女,山东威海人,副教授,硕士,主要研究方向:油气田开发大数据、油气田数值模拟;李芳菲(1998-),女,山东东营人,主要研究方向:图像增强、计算机视觉;祁鹏(1993-),男,辽宁盘锦人,硕士研究生,主要研究方向:稠油开发、油气田开发大数据。

Abstract: Aiming at the imbalance of training data set, an integrated support vector machine classification algorithm was proposed by combining sampling method with ensemble method. Firstly, unsupervised clustering was performed on an unbalanced training set, then the underlying local attention support vector machine was used to partition the data set so as to precisely control the local features of data sets. Finally, top support vector machine was used to predicte classification. The evaluation results on UCI dataset show that compared with the popular algorithms such as sampling based Kernelized Synthetic Minority Over-sampling TEchnique (K-SMOTE), integration based Gradient Tree Boosting (GTB) and cost sensitive ensemble algorithm (AdaCost), the proposed support vector machine algorithm can significantly improve the classification effect and solve the problem of unbalanced data set to a certain extent.

Key words: unbalanced data set, Support Vector Machine (SVM), ensemble algorithm, unsupervised clustering

摘要: 针对训练数据集的不均衡性这一问题,结合采样方法和集成方法,提出一种集成支持向量机分类算法。该算法首先对不均衡的训练集进行非监督聚类;然后依靠其底层的局部关注支持向量机进行数据集局部划分,以精确把控数据集间的局部特征;最后通过顶层支持向量机进行分类预测。在UCI数据集上的评测结果显示,该算法与当前流行的算法(如基于采样的核化少数类过采样技术(K-SMOTE)、基于集成的梯度提升决策树(GTB)和代价敏感集成算法(AdaCost)等)相比,分类效果有明显提升,能在一定程度上解决数据集的不均衡问题。

关键词: 非均衡数据集, 支持向量机, 集成算法, 非监督聚类

CLC Number: