计算机应用 ›› 2011, Vol. 31 ›› Issue (09): 2399-2401.DOI: 10.3724/SP.J.1087.2011.02399

• 数据库技术 • 上一篇    下一篇

面向不均衡数据集的ISMOTE算法

许丹丹1,王勇2,蔡立军1   

  1. 1. 西北工业大学 理学院,西安 710129
    2. 西北工业大学 计算机学院,西安 710072
  • 收稿日期:2011-03-15 修回日期:2011-05-17 发布日期:2011-09-01 出版日期:2011-09-01
  • 通讯作者: 许丹丹
  • 作者简介:许丹丹(1984-),女,河北唐山人,硕士研究生,主要研究方向:偏斜数据中的抽样算法;
    王勇(1973-),男,陕西西安人,副教授,博士,主要研究方向:数据挖掘;
    蔡立军(1963-),男,陕西西安人,副教授,博士,主要研究方向:飞行器的控制、制导与仿真、微分对策、分散控制。
  • 基金资助:
    国家自然科学基金资助项目(60873196)

ISMOTE Algorithm Of Facing The Imbalanced Data Sets

XU Dan-dan1,WANG Yong2,CAI Li-jun1   

  1. 1. School of Science, Northwestern Polytechnical University, Xi'an Shaanxi 710129, China
    2. School of Computer Science and Technology, Northwestern Polytechnical University, Xi'an Shaanxi 710072, China
  • Received:2011-03-15 Revised:2011-05-17 Online:2011-09-01 Published:2011-09-01
  • Contact: XU Dan-dan

摘要: 为了提高不均衡数据集中少数类的分类性能,提出ISMOTE算法。它是在少数类实例及其最近邻少数类实例构成的n维球体内进行随机插值,从而来改进数据分布的不均衡程度。通过实际数据集上的实验,与SMOTE算法和直接分类不均衡数据算法的性能比较结果表明,ISMOTE算法具有更高的分类精度,可以有效地改进分类器的性能。

关键词: 不均衡数据集, 分类, 虚拟实例, 少数类过抽样算法

Abstract: In order to improve the classification performance of minority class instances in imbalanced dataset, a new algorithm named ISMOTE (Improved Synthetic Minority Over-sampling TEchnique) was proposed. ISMOTE improved the imbalanced distribution of data through randomizing interpolation in the ball space constituted of the minority class instances and its nearest neighbor. The experiment was given on real data set. The experimental results show that the ISMOTE has substantial advantages over SMOTE (Synthetic Minority Over-sampling Technique) and direct classifying imbalanced data algorithm in prediction accuracy, and it can effectively improve the performance of classifier.

Key words: imbalanced dataset, classification, virtual instances, Synthetic Minority Over-sampling TEchnique (SMOTE)

中图分类号: