计算机应用 ›› 2014, Vol. 34 ›› Issue (6): 1618-1621.DOI: 10.11772/j.issn.1001-9081.2014.06.1618

• 人工智能 • 上一篇    下一篇

用于处理不平衡样本的改进近似支持向量机新算法

刘艳1,2,钟萍2,陈静2,宋晓华1,何云3   

  1. 1. 燕京理工学院 机电学院,河北 廊坊 065201
    2. 中国农业大学 理学院,北京 100083
    3.
  • 收稿日期:2013-11-18 修回日期:2014-01-21 出版日期:2014-06-01 发布日期:2014-07-02
  • 通讯作者: 陈静
  • 作者简介:刘艳(1984-),女,湖南娄底人,助教,硕士,主要研究方向:数据挖掘、支持向量机;钟萍(1971-),女,山东青岛人,教授,博士,主要研究方向:支持向量机;陈静(1964-),女,河南南阳人,教授, 主要研究方向:数据挖掘、支持向量机、最优化方法、数值模拟。
  • 基金资助:

    国家自然科学基金资助项目

Modified proximal support vector machine algorithm for dealing with unbalanced samples

LIU Yan1,2,ZHONG Ping2,CHEN Jing2,SONG Xiaohua1,HE Yun1   

  1. 1. College of Mechanical and Electrical Engineering, Yanching Institute of Technology, Langfang Hebei 065201, China
    2. College of Science, China Agricultural University, Beijing 100083, China;
  • Received:2013-11-18 Revised:2014-01-21 Online:2014-06-01 Published:2014-07-02
  • Contact: CHEN Jing

摘要:

近似支持向量机(PSVM)在处理不平衡样本时,会过拟合样本点数较多的一类,低估样本点数较少的类的错分误差,从而导致整体样本的分类准确率下降。针对该问题,提出一种用于处理不平衡样本的改进的PSVM新算法。新算法不仅给正、负类样本赋予不同的惩罚因子,而且在约束条件中新增参数,使得分类面更具灵活性。该算法先对训练集训练获得最优参数,然后再对测试集进行训练获得分类超平面,最后输出分类结果。UCI数据库中9组数据集的实验结果表明:新算法提高了样本的分类准确率,在线性的情况下平均提高了2.19个百分点,在非线性的情况下平均提高了3.14个百分点,有效地提高了模型的泛化能力。

Abstract:

When Proximal Support Vector Machine (PSVM) deals with unbalanced samples, it will overfit the class with large samples and underestimate the misclassification error of the class with small samples, resulting in the decline of accuracy in overall samples. To solve this problem, a modified PSVM used for dealing with unbalanced samples was proposed. The new algorithm not only set different punishments for positive and negative samples, but also added a new parameter to the constraint, making the classification hyperplane more flexible. Firstly, the new algorithm trained the training set to obtain the optimal parameters, then the classification hyperplane was obtained by training the test set. Finally, the classification results was output. The experiments presented by 9 datasets in UCI database show that the new algorithm improves the classification accuracy of the samples, by 2.19 and 3.14 percentage points in the linear and nonlinear case respectively. The generalization ability of the algorithm is strengthened effectively.

中图分类号: