基于粒子群优化的不均衡数据学习

doi:10.3724/SP.J.1087.2013.00789

计算机应用 ›› 2013, Vol. 33 ›› Issue (03): 789-792.DOI: 10.3724/SP.J.1087.2013.00789

基于粒子群优化的不均衡数据学习

曹鹏^1,2*,李博^1,2,栗伟^1,2,赵大哲^1,2

1.东北大学信息科学与工程学院,沈阳 110004;
2.医学影像计算教育部重点实验室(东北大学), 沈阳 110179

收稿日期:2012-09-03 修回日期:2012-10-08 出版日期:2013-03-01 发布日期:2013-03-01
通讯作者: 曹鹏
作者简介:曹鹏(1982-),男,辽宁沈阳人,博士研究生,主要研究方向:机器学习、影像挖掘; 李博(1985-),男,辽宁沈阳人,博士研究生,主要研究方向:影像检索与挖掘; 栗伟(1980-),男,辽宁沈阳人,博士研究生,主要研究方向:文本挖掘; 赵大哲(1960-),女,辽宁沈阳人,教授,主要研究方向:软件工程、数据挖掘、医学影像处理。
基金资助:
国家自然科学基金资助项目(61001047); 中央高校基本科研业务费专项资金资助项目(N110618001)。

Imbalanced data learning based on particle swarm optimization

CAO Peng^1,2*, LI Bo^1,2, LI Wei^1,2, ZHAO Dazhe^1,2

1.College of Information Science and Engineering, Northeastern University, Shenyang Liaoning 110004, China;
2.Key Laboratory of Medical Image Computing of Ministry of Education (Northeastern University), Shenyang Liaoning 110179, China

Received:2012-09-03 Revised:2012-10-08 Online:2013-03-01 Published:2013-03-01
Contact: CAO Peng

摘要/Abstract

摘要： 为了提高重采样算法在不均衡数据学习的性能,提出一种基于粒子群优化的不均衡数据学习方法。通过粒子群优化,以不均衡数据分类评价准则作为目标函数,来优化重采样算法中最佳的采样率,同时对特征进行选择,从而达到最佳的数据分布。该算法在大量UCI数据集上进行了测试,与其他不均衡学习算法进行比较,结果表明该算法具有更高的分类性能; 并验证了同时优化采样率和特征集合,可有效地改进不均衡数据分类效果。

关键词: 粒子群优化, 群体智能, 不均衡数据分类, 重采样, 特征选择

Abstract: In order to improve the classification performance on the imbalanced data, a new Particle Swarm Optimization (PSO) based method was introduced. It optimized the re-sampling rate and selected the feature set simultaneously, with the imbalanced data evaluation metric as objective function through particle swarm optimization, so as to achieve the best data distribution. The proposed method was tested on a large number of UCI datasets and compared with the state-of-the-art methods. The experimental results show that the proposed method has substantial advantages over other methods; moreover, it proves that it can effectively improve the performance on the imbalanced data by optimizing the re-sampling rate and feature set simultaneously.

Key words: Particle Swarm Optimization (PSO), swarm intelligence, imbalanced data classification, re-sampling, feature selection

中图分类号:

TP391

曹鹏李博栗伟赵大哲. 基于粒子群优化的不均衡数据学习[J]. 计算机应用, 2013, 33(03): 789-792.

CAO Peng LI Bo LI Wei ZHAO Dazhe. Imbalanced data learning based on particle swarm optimization[J]. Journal of Computer Applications, 2013, 33(03): 789-792.

参考文献

[1]叶志飞, 文益民, 吕宝粮. 不平衡分类问题研究综述[J]. 智能系统学报, 2009,4(2):148-156.
[2]YANG Q, WU X. 10 challenging problems in data mining research [J]. International Journal of Information Technology & Decision Making, 2006, 5(4):597-604.
[3]HE H B, GARCIA E A. Learning from imbalanced data [J]. IEEE Transactions on Knowledge and Data Engineering, 2009, 21(9): 1263-1284.
[4]WEISS G M, PROVOST F. Learning when training data are costly: the effect of class distribution on tree induction [J]. Journal of Artificial Intelligence Research, 2003, 19(1): 315-354.
[5]CHEN S, HE H B, GARCIA E A. RAMOboost: ranked minority oversampling in boosting [J]. IEEE Transactions on Neural Networks, 2010, 21(10): 1624-1642.
[6]RAMENTOL E, CABALLERO Y, BELLO R, et al. SMOTE-RSB*: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory [J]. Knowledge and Information Systems,2012,33(2): 245-265.
[7]许丹丹,王勇,蔡立军.面向不均衡数据集的ISMOTE算法 [J].计算机应用, 2011, 31(9):2399-2401.
[8]WASIKOWSKI M, CHEN X W. Combating the small sample class imbalance problem using feature selection [J]. IEEE Transactions on Knowledge and Data Engineering, 2010, 22(10): 1388-1400.
[9]ZHENG Z H, WU X Y, SRIHARI R. Feature selection for text categorization on imbalanced data [J]. ACM SIGKDD Explorations Newsletter — Special Issue on Learning from Imbalanced Datasets, 2004,6(1):80-89.
[10]CHAWLA N V, BOWYER K W, HALL L O, et al. SMOTE: synthetic minority over-sampling technique [J]. Journal of Artificial Intelligence Research, 2002,16: 321-357.
[11]KENNEDY J, EBERHART R C. Particle swarm optimization [C]// Proceedings of IEEE International Conference on Neural Networks. Piscataway, NJ: IEEE Press, 1995, 4: 1942-1948.
[12]HASSAN R, COHANIM R, de WECK O. A comparison of particle swarm optimization and the genetic algorithm [C]// Proceedings of the 46th AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics and Materials Conference. [S.l.]: AIAA, 2005:1-13.
[13]FAWCETT T. An introduction to ROC analysis [J]. Pattern Recognition Letters, 2006, 27(8): 861-874.
[14]THAI-NGHE N, GANTNER Z, SCHMIDT-THIEME L. Cost-sensitive learning methods for imbalanced data [C]// Proceedings of 2010 International Joint Conference on Neural Networks. Piscataway, NJ: IEEE Press, 2010: 1-8.
[15]CARLISLE A, DOZIER G. An off-the-shelf PSO [C]// Proceedings of the Particle Swarm Optimization Workshop. Indianapolis: [s.n.], 2001:1-6.
[16]CHAWLA N V, LAZAREVIC A, HALL L O, et al. SMOTEBoost: improving prediction of the minority class in boosting [C]// PKDD 2003: Proceedings of the Seventh European Conference on Principles and Practice of Knowledge Discovery in Databases, LNCS 2838. Berlin: Springer-Verlag, 2003: 107-119.
[17]DOMINGOS P. MetaCost: a general method for making classifiers cost-sensitive [C]// KDD '99: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM Press, 1999:155-164.

基于粒子群优化的不均衡数据学习

Imbalanced data learning based on particle swarm optimization

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	湛航, 何朗, 黄樟灿, 李华峰, 张蔷, 谈庆. 改进的基于层次距离的基因表达式编程特征选择分类算法[J]. 计算机应用, 2021, 41(9): 2658-2667.
[2]	祝承, 赵晓琦, 赵丽萍, 焦玉宏, 朱亚飞, 陈建英, 周伟, 谭颖. 基于谱聚类半监督特征选择的功能磁共振成像数据分类[J]. 计算机应用, 2021, 41(8): 2288-2293.
[3]	李蒙蒙, 秦伟, 刘艺, 刁兴春. 结合头脑风暴优化的混合蚁群优化算法[J]. 计算机应用, 2021, 41(8): 2412-2417.
[4]	张闻强, 邢征, 杨卫东. 基于多区域采样策略的混合粒子群优化求解多目标柔性作业车间调度问题[J]. 计算机应用, 2021, 41(8): 2249-2257.
[5]	张盟, 郭健全. 需求和回收不确定的闭环供应链渠道结构选择[J]. 计算机应用, 2021, 41(7): 2100-2107.
[6]	林筠超, 万源. 基于图结构优化的自适应多度量非监督特征选择方法[J]. 计算机应用, 2021, 41(5): 1282-1289.
[7]	贾鹤鸣, 姜子超, 李瑶, 孙康健. 基于改进斑点鬣狗优化算法的同步优化特征选择[J]. 计算机应用, 2021, 41(5): 1290-1298.
[8]	李萍, 汪芬, 陈祺东, 孙俊. 求解多目标社区发现问题的离散化随机漂移粒子群优化算法[J]. 计算机应用, 2021, 41(3): 803-811.
[9]	唐延强, 李成海, 宋亚飞. 基于改进粒子群优化和极限学习机的网络安全态势预测[J]. 计算机应用, 2021, 41(3): 768-773.
[10]	樊小毛, 熊红林, 赵淦森. 带约束的清洁排班问题模型及其求解[J]. 计算机应用, 2021, 41(2): 577-582.
[11]	王泽昆, 贺毅朝, 李焕哲, 张发展. 基于新颖S型转换函数的二进制粒子群优化算法求解具有单连续变量的背包问题[J]. 计算机应用, 2021, 41(2): 461-469.
[12]	张志浩, 林耀进, 卢舜, 郭晨, 王晨曦. 缺失标记下基于类属属性的多标记特征选择[J]. 计算机应用, 2021, 41(10): 2849-2857.
[13]	黄学雨, 徐浩特, 陶剑文. 具有特征选择的多源自适应分类框架[J]. 计算机应用, 2020, 40(9): 2499-2506.
[14]	顾桐, 许国良, 李万林, 李家浩, 王志愿, 雒江涛. 基于集成LightGBM和贝叶斯优化策略的房价智能评估模型[J]. 计算机应用, 2020, 40(9): 2762-2767.
[15]	郭秀婷, 朱昶胜, 张生财, 赵奎鹏. 分形插值在风速时间序列中的应用[J]. 计算机应用, 2020, 40(9): 2628-2633.