计算机应用 ›› 2013, Vol. 33 ›› Issue (03): 789-792.DOI: 10.3724/SP.J.1087.2013.00789

• 人工智能 • 上一篇    下一篇

基于粒子群优化的不均衡数据学习

曹鹏1,2*,李博1,2,栗伟1,2,赵大哲1,2   

  1. 1.东北大学 信息科学与工程学院,沈阳 110004;
    2.医学影像计算教育部重点实验室(东北大学), 沈阳 110179
  • 收稿日期:2012-09-03 修回日期:2012-10-08 出版日期:2013-03-01 发布日期:2013-03-01
  • 通讯作者: 曹鹏
  • 作者简介:曹鹏(1982-),男,辽宁沈阳人,博士研究生,主要研究方向:机器学习、影像挖掘; 李博(1985-),男,辽宁沈阳人,博士研究生,主要研究方向:影像检索与挖掘; 栗伟(1980-),男,辽宁沈阳人,博士研究生,主要研究方向:文本挖掘; 赵大哲(1960-),女,辽宁沈阳人,教授,主要研究方向:软件工程、数据挖掘、医学影像处理。
  • 基金资助:

    国家自然科学基金资助项目(61001047); 中央高校基本科研业务费专项资金资助项目(N110618001)。

Imbalanced data learning based on particle swarm optimization

CAO Peng1,2*, LI Bo1,2, LI Wei1,2, ZHAO Dazhe1,2   

  1. 1.College of Information Science and Engineering, Northeastern University, Shenyang Liaoning 110004, China;
    2.Key Laboratory of Medical Image Computing of Ministry of Education (Northeastern University), Shenyang Liaoning 110179, China
  • Received:2012-09-03 Revised:2012-10-08 Online:2013-03-01 Published:2013-03-01
  • Contact: CAO Peng

摘要: 为了提高重采样算法在不均衡数据学习的性能,提出一种基于粒子群优化的不均衡数据学习方法。通过粒子群优化,以不均衡数据分类评价准则作为目标函数,来优化重采样算法中最佳的采样率,同时对特征进行选择,从而达到最佳的数据分布。该算法在大量UCI数据集上进行了测试,与其他不均衡学习算法进行比较,结果表明该算法具有更高的分类性能; 并验证了同时优化采样率和特征集合,可有效地改进不均衡数据分类效果。

关键词: 粒子群优化, 群体智能, 不均衡数据分类, 重采样, 特征选择

Abstract: In order to improve the classification performance on the imbalanced data, a new Particle Swarm Optimization (PSO) based method was introduced. It optimized the re-sampling rate and selected the feature set simultaneously, with the imbalanced data evaluation metric as objective function through particle swarm optimization, so as to achieve the best data distribution. The proposed method was tested on a large number of UCI datasets and compared with the state-of-the-art methods. The experimental results show that the proposed method has substantial advantages over other methods; moreover, it proves that it can effectively improve the performance on the imbalanced data by optimizing the re-sampling rate and feature set simultaneously.

Key words: Particle Swarm Optimization (PSO), swarm intelligence, imbalanced data classification, re-sampling, feature selection

中图分类号: