Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (11): 3307-3321.DOI: 10.11772/j.issn.1001-9081.2021122060

• CCF Bigdata 2021 •     Next Articles

Survey on imbalanced multi‑class classification algorithms

Mengmeng LI1, Yi LIU1(), Gengsong LI1, Qibin ZHENG2, Wei QIN1, Xiaoguang REN1   

  1. 1.Defense Innovation Institute,Academy of Military Science,Beijing 100071,China
    2.Academy of Military Science,Beijing 100091,China
  • Received:2021-12-06 Revised:2021-12-30 Accepted:2022-01-18 Online:2022-03-04 Published:2022-11-10
  • Contact: Yi LIU
  • About author:LI Mengmeng, born in 1992, M. S. candidate. Her research interests include data quality, evolutionary algorithms.
    LIU Yi, born in 1990, Ph. D., research assistant. His research interests include robot operating system, data quality, evolutionary algorithms.
    LI Gengsong, born in 1999, M. S. candidate. His research interests include big data, algorithm selection.
    ZHENG Qibin, born in 1990, Ph. D., research assistant. His research interests include data engineering, data mining, machine learning.
    QIN Wei, born in 1983, M. S., research assistant. His research interests include intelligent information system management.
    REN Xiaoguang, born in 1986, Ph. D., associate research fellow. His research interests include robot operation system, high‑performance computing, numerical computation and simulation.
  • Supported by:
    National Natural Science Foundation of China(61802426)

不平衡多分类算法综述

李蒙蒙1, 刘艺1(), 李庚松1, 郑奇斌2, 秦伟1, 任小广1   

  1. 1.军事科学院 国防科技创新研究院,北京 100071
    2.军事科学院,北京 100091
  • 通讯作者: 刘艺
  • 作者简介:李蒙蒙(1992—),女,河北邯郸人,硕士研究生,主要研究方向:数据质量、演化算法
    刘艺(1990—),男(回族),安徽蚌埠人,助理研究员,博士,主要研究方向:机器人操作系统、数据质量、演化算法 albertliu20th@163.com
    李庚松(1999—),男,湖南长沙人,硕士研究生,主要研究方向:大数据、算法选择
    郑奇斌(1990—),男,甘肃兰州人,助理研究员,博士,主要研究方向:数据工程、数据挖掘、机器学习
    秦伟(1983—),男,安徽阜阳人,助理研究员,硕士,主要研究方向:智能信息系统管理
    任小广(1986—),男,湖北随州人,副研究员,博士,主要研究方向:机器人操作系统、高性能计算、数值计算和模拟。
  • 基金资助:
    国家自然科学基金资助项目(61802426)

Abstract:

Imbalanced data classification is an important research content in machine learning, but most of the existing imbalanced data classification algorithms foucus on binary classification, and there are relatively few studies on imbalanced multi?class classification. However, datasets in practical applications usually have multiple classes and imbalanced data distribution, and the diversity of classes further increases the difficulty of imbalanced data classification, so the multi?class classification problem has become a research topic to be solved urgently. The imbalanced multi?class classification algorithms proposed in recent years were reviewed. According to whether the decomposition strategy was adopted, imbalanced multi?class classification algorithms were divided into decomposition methods and ad?hoc methods. Furthermore, according to the different adopted decomposition strategies, the decomposition methods were divided into two frameworks: One Vs. One (OVO) and One Vs. All (OVA). And according to different used technologies, the ad?hoc methods were divided into data?level methods, algorithm?level methods, cost?sensitive methods, ensemble methods and deep network?based methods. The advantages and disadvantages of these methods and their representative algorithms were systematically described, the evaluation indicators of imbalanced multi?class classification methods were summarized, the performance of the representative methods were deeply analyzed through experiments, and the future development directions of imbalanced multi?class classification were discussed.

Key words: imbalanced classification, multi?class classification, imbalanced multi?class classification, classification algorithm, machine learning

摘要:

不平衡数据分类是机器学习领域的重要研究内容,但现有的不平衡分类算法通常针对不平衡二分类问题,关于不平衡多分类的研究相对较少。然而实际应用中的数据集通常具有多类别且数据分布具有不平衡性,而类别的多样性进一步加剧了不平衡数据的分类难度,因此不平衡多分类问题已经成为亟待解决的研究课题。针对近年来提出的不平衡多分类算法展开综述,根据是否采用分解策略把不平衡多分类算法分为分解方法和即席方法,并进一步将分解方法按照分解策略的不同划分为“一对一(OVO)”架构和“一对多(OVA)”架构,将即席方法按照处理技术的不同分为数据级方法、算法级方法、代价敏感方法、集成方法和基于深度网络的方法。系统阐述各类方法的优缺点及其代表性算法,总结概括不平衡多分类方法的评价指标,并通过实验深入分析代表性方法的性能,讨论了不平衡多分类的未来发展方向。

关键词: 不平衡分类, 多类别分类, 不平衡多分类, 分类算法, 机器学习

CLC Number: