用于处理不平衡样本的改进近似支持向量机新算法

doi:10.11772/j.issn.1001-9081.2014.06.1618

计算机应用 ›› 2014, Vol. 34 ›› Issue (6): 1618-1621.DOI: 10.11772/j.issn.1001-9081.2014.06.1618

用于处理不平衡样本的改进近似支持向量机新算法

刘艳¹,²,钟萍²,陈静²,宋晓华¹,何云³

1. 燕京理工学院机电学院,河北廊坊 065201
2. 中国农业大学理学院,北京 100083
3.

收稿日期:2013-11-18 修回日期:2014-01-21 出版日期:2014-06-01 发布日期:2014-07-02
通讯作者: 陈静
作者简介:刘艳(1984-),女,湖南娄底人,助教,硕士,主要研究方向:数据挖掘、支持向量机;钟萍(1971-),女,山东青岛人,教授,博士,主要研究方向:支持向量机;陈静(1964-),女,河南南阳人,教授, 主要研究方向:数据挖掘、支持向量机、最优化方法、数值模拟。
基金资助:
国家自然科学基金资助项目

Modified proximal support vector machine algorithm for dealing with unbalanced samples

LIU Yan¹,²,ZHONG Ping²,CHEN Jing²,SONG Xiaohua¹,HE Yun¹

1. College of Mechanical and Electrical Engineering, Yanching Institute of Technology, Langfang Hebei 065201, China
2. College of Science, China Agricultural University, Beijing 100083, China;

Received:2013-11-18 Revised:2014-01-21 Online:2014-06-01 Published:2014-07-02
Contact: CHEN Jing

摘要/Abstract

摘要：

近似支持向量机(PSVM)在处理不平衡样本时,会过拟合样本点数较多的一类,低估样本点数较少的类的错分误差,从而导致整体样本的分类准确率下降。针对该问题，提出一种用于处理不平衡样本的改进的PSVM新算法。新算法不仅给正、负类样本赋予不同的惩罚因子,而且在约束条件中新增参数,使得分类面更具灵活性。该算法先对训练集训练获得最优参数,然后再对测试集进行训练获得分类超平面,最后输出分类结果。UCI数据库中9组数据集的实验结果表明:新算法提高了样本的分类准确率,在线性的情况下平均提高了2.19个百分点,在非线性的情况下平均提高了3.14个百分点,有效地提高了模型的泛化能力。

Abstract:

When Proximal Support Vector Machine (PSVM) deals with unbalanced samples, it will overfit the class with large samples and underestimate the misclassification error of the class with small samples, resulting in the decline of accuracy in overall samples. To solve this problem, a modified PSVM used for dealing with unbalanced samples was proposed. The new algorithm not only set different punishments for positive and negative samples, but also added a new parameter to the constraint, making the classification hyperplane more flexible. Firstly, the new algorithm trained the training set to obtain the optimal parameters, then the classification hyperplane was obtained by training the test set. Finally, the classification results was output. The experiments presented by 9 datasets in UCI database show that the new algorithm improves the classification accuracy of the samples, by 2.19 and 3.14 percentage points in the linear and nonlinear case respectively. The generalization ability of the algorithm is strengthened effectively.

中图分类号:

TP18

刘艳钟萍陈静宋晓华何云. 用于处理不平衡样本的改进近似支持向量机新算法[J]. 计算机应用, 2014, 34(6): 1618-1621.

LIU Yan ZHONG Ping CHEN Jing SONG Xiaohua HE Yun. Modified proximal support vector machine algorithm for dealing with unbalanced samples[J]. Journal of Computer Applications, 2014, 34(6): 1618-1621.

参考文献

[1]BRADLEY P S, MANGASARIAN O L. Massive data discrimination via linear support vector machines [J]. Optimization Methods and Software, 2000,13(1): 1-10.
[2]BURGES C J C. A tutorial on support vector machines for pattern recognition [J]. Data Mining and Knowledge Discovery, 1998, 2(2): 121-167.
[3]VAPNIK V N. The nature of statistical learning theory[M]. Berlin: Springer-Verlag, 1995.
[4]FUNG G, MANGASARIAN O L. Proximal support vector machine classifiers [C]// KDD 2001: Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM Press, 2001 :77-82.
[5]FUNG G, MANGASARIAN O L. Incremental support vector machine classification[C]// Proceedings of the 2nd SIAM International Conference on Data Mining. Philadelphia: University of Wisconsin,2002 : 247-260.
[6]AGARWAL D K. Shrinkage estimator generalizations of proximal support vector machine[C]// Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM Press, 2002: 173-182.
[7]LI K, HUANG H. Incremental learning proximal support vector machine classifier [C]// Proceedings of the 2002 International Conference on Machine Learning and Cybernetics. Piscataway: IEEE Press, 2002,3:1635-1637.
[8]YU X, LU W, CHU F. Rotating machinery fault diagnosis based on fuzzy proximal support vector machine optimized by particle swarm optimization[J]. Journal of Vibration and Shock, 2009,28(11): 183-186.(于湘涛,卢文秀,褚福磊. 基于PSO算法的模糊PSVM及其在旋转机械故障诊断中的应用[J]. 振动与冲击,2009,28(11): 183-186.)
[9]WANG Y, GAO X, LI J. A PSVM-based active learning method for mass detection[J]. Journal of Computer Research and Development, 2012,49(3):572-578.(王颖,高新波,李洁. 基于PSVM的主动学习肿块检测方法[J]. 计算机研究与发展,2012,49(3):572-578.)
[10]FUNG G, MANGASARIAN O L. Multicategory proximal support vector machine classifiers [J ]. Machine Learning, 2005,59(1/2):77-97.
[11]TAO X, JI H, DONG S. Modified PSVM and its application in unbalanced data classification [J]. Computer Engineering, 2007, 33(24):191-193.(陶晓燕,姬红兵,董淑福. 改进的PSVM及其在非平衡数据分类中的应用[J]. 计算机工程,2007, 33(24):191-193.)
[12]ZHUANG D, CHEN Y. Text classification by weighted proximal support vector machine [J]. Journal of Tsinghua University: Science and Technology, 2005,45(S1):1787-1790.(庄东, 陈英. 基于加权近似支持向量机的文本分类[J]. 清华大学学报:自然科学版,2005,45(S1): 1787-1790.)
[13]ZHANG M, FU L, WANG G. Fuzzy proximal support vector machine [J]. Computer Engineering and Applications, 2005,41(5):37-39.(张猛, 付丽华, 王高峰. 模糊临近支持向量机[J]. 计算机工程与应用, 2005,41(5):37-39.)
[14]WANG X, CUI F, LU S. Density weighted proximal support vector machine[J].Computer Science, 2012,39(1):182-184. (王熙照,崔芳芳,鲁淑霞. 密度加权近似支持向量机[J]. 计算机科学,2012,39(1):182-184.)
[15]WU Z, DOU H. Generalized least squares support vector machine algorithm and its application [J]. ,2009, 29(3):877-879.(吴宗亮,窦衡.一种广义最小二乘支持向量机算法及其应用[J]. 计算机应用,2009,29(3):877-879.)

用于处理不平衡样本的改进近似支持向量机新算法

Modified proximal support vector machine algorithm for dealing with unbalanced samples

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	杜航原郝思聪王文剑. 结合图自编码器与聚类的半监督表示学习方法[J]. 计算机应用, 0, (): 0-0.
[2]	陈露张晓霞于洪. 基于先验知识的非负矩阵半可解释三因子分解算法[J]. 计算机应用, 0, (): 0-0.
[3]	韩舒宁徐敏董学士林青沈凡凡. 混合伊藤算法求解多尺度着色旅行商问题[J]. 计算机应用, 0, (): 0-0.
[4]	李晓杰崔超然宋广乐苏雅茜吴天泽张春云. 基于时序超图卷积神经网络的股票趋势预测方法[J]. 计算机应用, 0, (): 0-0.
[5]	张建严珂马祥. 基于神经网络的复杂垃圾信息过滤算法分析[J]. 计算机应用, 0, (): 0-0.
[6]	邱云志汪廷华戴小路. 双重特征加权模糊支持向量机[J]. 计算机应用, 0, (): 0-0.
[7]	李宗正周恺卿丁雷欧云. 基于基因交换的自适应人工鱼群算法[J]. 计算机应用, 0, (): 0-0.
[8]	刘清华廖士中. 基于随机素描方法的在线核回归[J]. 计算机应用, 0, (): 0-0.
[9]	张小清王晨曦吕彦林耀进. 基于ReliefF的层次分类在线流特征选择算法[J]. 计算机应用, 0, (): 0-0.
[10]	于婉莹梁美玉王笑笑陈徵曹晓雯. 基于深度注意力网络的课堂教学视频中学生表情识别与智能教学评估[J]. 计算机应用, 0, (): 0-0.
[11]	黄勇康梁美玉王笑笑陈徵曹晓雯. 基于深度时空残差卷积神经网络的课堂教学视频中多人课堂行为识别[J]. 计算机应用, 0, (): 0-0.
[12]	康猛蒙祖强. 基于局部条件区分能力的高效属性约简算法[J]. 计算机应用, 0, (): 0-0.
[13]	谢鑫张贤勇王旋晔唐鹏飞. 变精度邻域等价粒邻域决策树构造算法[J]. 计算机应用, 0, (): 0-0.
[14]	刘忠慧王梓宥闵帆. 近似概念的遗传生成算法及其推荐应用[J]. 计算机应用, 0, (): 0-0.
[15]	潘仁志钱付兰赵姝张燕平. 基于卷积神经网络交互的用户属性偏好建模的推荐模型[J]. 计算机应用, 0, (): 0-0.