《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (11): 3307-3321.DOI: 10.11772/j.issn.1001-9081.2021122060
所属专题: 综述; 第九届CCF大数据学术会议(CCF Bigdata 2021)
• 第九届CCF大数据学术会议 • 下一篇
李蒙蒙1, 刘艺1(), 李庚松1, 郑奇斌2, 秦伟1, 任小广1
收稿日期:
2021-12-06
修回日期:
2021-12-30
接受日期:
2022-01-18
发布日期:
2022-03-04
出版日期:
2022-11-10
通讯作者:
刘艺
作者简介:
李蒙蒙(1992—),女,河北邯郸人,硕士研究生,主要研究方向:数据质量、演化算法基金资助:
Mengmeng LI1, Yi LIU1(), Gengsong LI1, Qibin ZHENG2, Wei QIN1, Xiaoguang REN1
Received:
2021-12-06
Revised:
2021-12-30
Accepted:
2022-01-18
Online:
2022-03-04
Published:
2022-11-10
Contact:
Yi LIU
About author:
LI Mengmeng, born in 1992, M. S. candidate. Her research interests include data quality, evolutionary algorithms.Supported by:
摘要:
不平衡数据分类是机器学习领域的重要研究内容,但现有的不平衡分类算法通常针对不平衡二分类问题,关于不平衡多分类的研究相对较少。然而实际应用中的数据集通常具有多类别且数据分布具有不平衡性,而类别的多样性进一步加剧了不平衡数据的分类难度,因此不平衡多分类问题已经成为亟待解决的研究课题。针对近年来提出的不平衡多分类算法展开综述,根据是否采用分解策略把不平衡多分类算法分为分解方法和即席方法,并进一步将分解方法按照分解策略的不同划分为“一对一(OVO)”架构和“一对多(OVA)”架构,将即席方法按照处理技术的不同分为数据级方法、算法级方法、代价敏感方法、集成方法和基于深度网络的方法。系统阐述各类方法的优缺点及其代表性算法,总结概括不平衡多分类方法的评价指标,并通过实验深入分析代表性方法的性能,讨论了不平衡多分类的未来发展方向。
中图分类号:
李蒙蒙, 刘艺, 李庚松, 郑奇斌, 秦伟, 任小广. 不平衡多分类算法综述[J]. 计算机应用, 2022, 42(11): 3307-3321.
Mengmeng LI, Yi LIU, Gengsong LI, Qibin ZHENG, Wei QIN, Xiaoguang REN. Survey on imbalanced multi‑class classification algorithms[J]. Journal of Computer Applications, 2022, 42(11): 3307-3321.
比较项目 | 分解方法 | 即席方法 |
---|---|---|
优点 | 1.二分类器训练难度较低; 2.可以充分利用现有的二分类算法 | 1.训练具有针对性的多分类器; 2.充分利用所有样本的分布信息 |
缺点 | 1.训练多个二分类器,引入分类器间组合优化问题; 2.OVO策略忽略其他类别信息,造成信息丢失; 3.OVA策略人为引入不平衡,增加训练难度,影响模型性能 | 1.分类器训练难度较大; 2.设计新算法需要大量开发成本和时间开销 |
表1 分解方法和即席方法的比较
Tab. 1 Comparison of decomposition methods and ad?hoc methods
比较项目 | 分解方法 | 即席方法 |
---|---|---|
优点 | 1.二分类器训练难度较低; 2.可以充分利用现有的二分类算法 | 1.训练具有针对性的多分类器; 2.充分利用所有样本的分布信息 |
缺点 | 1.训练多个二分类器,引入分类器间组合优化问题; 2.OVO策略忽略其他类别信息,造成信息丢失; 3.OVA策略人为引入不平衡,增加训练难度,影响模型性能 | 1.分类器训练难度较大; 2.设计新算法需要大量开发成本和时间开销 |
真实标签 类别 | 预测结果 | |
---|---|---|
正类 | 负类 | |
正类 | True Positive(TP) | False Negative(FN) |
负类 | False Positive(FP) | True Negative(TN) |
表2 二分类混淆矩阵
Tab. 2 Confusion matrix of binary classification
真实标签 类别 | 预测结果 | |
---|---|---|
正类 | 负类 | |
正类 | True Positive(TP) | False Negative(FN) |
负类 | False Positive(FP) | True Negative(TN) |
序号 | 数据集 | 样本数量 | 特征数量 | 类别数目 | 各类别实例数目 | 不平衡率/% | 应用背景 |
---|---|---|---|---|---|---|---|
1 | contraceptive | 1 473 | 9 | 3 | 629/511/333 | 1.89 | 避孕方法 |
2 | balance | 625 | 4 | 3 | 288/288/49 | 5.88 | 样本规模不平衡 |
3 | newthyroid | 215 | 5 | 3 | 150/35/30 | 5.00 | 甲状腺疾病(新版) |
4 | splice | 3 190 | 60 | 3 | 1 655/768/767 | 2.16 | 文本拼接 |
5 | thyroid | 7 200 | 21 | 3 | 6 666/368/166 | 40.16 | 甲状腺疾病 |
6 | wine | 178 | 13 | 3 | 71/59/48 | 1.48 | 酒类 |
7 | car | 1 728 | 6 | 4 | 1 210/384/69/65 | 18.62 | 车类 |
8 | page_blocks | 5 472 | 10 | 5 | 4 913/329/115/87/28 | 175.46 | 文本页面 |
9 | flare | 1 066 | 11 | 6 | 331/239/211/147/95/43 | 7.70 | 太阳耀斑 |
10 | satimage | 6 435 | 36 | 6 | 1 533/1 508/1 358/707/703/626 | 2.45 | 文本图像 |
表3 实验数据集属性
Tab. 3 Characteristics of experimental datasets
序号 | 数据集 | 样本数量 | 特征数量 | 类别数目 | 各类别实例数目 | 不平衡率/% | 应用背景 |
---|---|---|---|---|---|---|---|
1 | contraceptive | 1 473 | 9 | 3 | 629/511/333 | 1.89 | 避孕方法 |
2 | balance | 625 | 4 | 3 | 288/288/49 | 5.88 | 样本规模不平衡 |
3 | newthyroid | 215 | 5 | 3 | 150/35/30 | 5.00 | 甲状腺疾病(新版) |
4 | splice | 3 190 | 60 | 3 | 1 655/768/767 | 2.16 | 文本拼接 |
5 | thyroid | 7 200 | 21 | 3 | 6 666/368/166 | 40.16 | 甲状腺疾病 |
6 | wine | 178 | 13 | 3 | 71/59/48 | 1.48 | 酒类 |
7 | car | 1 728 | 6 | 4 | 1 210/384/69/65 | 18.62 | 车类 |
8 | page_blocks | 5 472 | 10 | 5 | 4 913/329/115/87/28 | 175.46 | 文本页面 |
9 | flare | 1 066 | 11 | 6 | 331/239/211/147/95/43 | 7.70 | 太阳耀斑 |
10 | satimage | 6 435 | 36 | 6 | 1 533/1 508/1 358/707/703/626 | 2.45 | 文本图像 |
数据集 | 分解方法 | 数据级方法 | 算法级方法 | 代价敏感方法 | 集成方法 | |||||
---|---|---|---|---|---|---|---|---|---|---|
A&O | OVO | OVA | DOVO | DECOC | OVO‑SMOTE | OVA‑SMOTE | OAHO | FuzzyImb | BBO | |
平均值 | 0.832 5 | 0.866 5 | 0.826 7 | 0.940 5 | 0.920 5 | 0.820 8 | 0.818 0 | 0.892 8 | 0.874 9 | 0.875 8 |
contraceptive | 0.746 1 | 0.755 6 | 0.736 6 | 0.675 5 | 0.663 7 | 0.575 1 | 0.508 1 | 0.756 3 | 0.726 1 | 0.676 2 |
balance | 0.617 6 | 0.606 4 | 0.763 2 | 0.931 3 | 0.927 7 | 0.828 7 | 0.828 9 | 0.720 0 | 0.807 2 | 0.900 8 |
newthyroid | 0.976 7 | 0.976 7 | 0.981 4 | 0.983 9 | 0.990 5 | 0.956 9 | 0.937 8 | 0.981 4 | 0.953 5 | 0.948 8 |
splice | 0.969 3 | 0.970 2 | 0.974 0 | 0.974 5 | 0.930 6 | 0.477 6 | 0.476 6 | 0.975 9 | 0.940 4 | 0.667 9 |
thyroid | 0.994 9 | 0.994 9 | 0.995 0 | 0.998 5 | 0.997 9 | 0.928 4 | 0.938 3 | 0.993 8 | 0.936 7 | 0.959 8 |
wine | 0.977 5 | 0.977 5 | 0.679 8 | 0.984 8 | 0.975 8 | 0.795 9 | 0.752 8 | 0.977 5 | 0.831 5 | 0.799 6 |
car | 0.648 1 | 0.703 1 | 0.915 5 | 0.987 1 | 0.876 8 | 0.952 8 | 0.921 6 | 0.923 6 | 0.909 4 | 0.950 1 |
page_blocks | 0.930 4 | 0.942 3 | 0.949 2 | 0.985 0 | 0.973 2 | 0.837 8 | 0.977 5 | 0.942 8 | 0.907 4 | 0.982 5 |
flare | 0.534 7 | 0.789 9 | 0.508 4 | 0.892 6 | 0.904 7 | 0.892 1 | 0.888 8 | 0.739 2 | 0.778 4 | 0.901 8 |
satimage | 0.929 4 | 0.948 3 | 0.763 6 | 0.991 8 | 0.963 7 | 0.962 4 | 0.950 0 | 0.917 9 | 0.958 3 | 0.970 8 |
表4 典型方法在实验数据集上的准确度值
Tab. 4 Accuracy values of classic methods on experimental datasets
数据集 | 分解方法 | 数据级方法 | 算法级方法 | 代价敏感方法 | 集成方法 | |||||
---|---|---|---|---|---|---|---|---|---|---|
A&O | OVO | OVA | DOVO | DECOC | OVO‑SMOTE | OVA‑SMOTE | OAHO | FuzzyImb | BBO | |
平均值 | 0.832 5 | 0.866 5 | 0.826 7 | 0.940 5 | 0.920 5 | 0.820 8 | 0.818 0 | 0.892 8 | 0.874 9 | 0.875 8 |
contraceptive | 0.746 1 | 0.755 6 | 0.736 6 | 0.675 5 | 0.663 7 | 0.575 1 | 0.508 1 | 0.756 3 | 0.726 1 | 0.676 2 |
balance | 0.617 6 | 0.606 4 | 0.763 2 | 0.931 3 | 0.927 7 | 0.828 7 | 0.828 9 | 0.720 0 | 0.807 2 | 0.900 8 |
newthyroid | 0.976 7 | 0.976 7 | 0.981 4 | 0.983 9 | 0.990 5 | 0.956 9 | 0.937 8 | 0.981 4 | 0.953 5 | 0.948 8 |
splice | 0.969 3 | 0.970 2 | 0.974 0 | 0.974 5 | 0.930 6 | 0.477 6 | 0.476 6 | 0.975 9 | 0.940 4 | 0.667 9 |
thyroid | 0.994 9 | 0.994 9 | 0.995 0 | 0.998 5 | 0.997 9 | 0.928 4 | 0.938 3 | 0.993 8 | 0.936 7 | 0.959 8 |
wine | 0.977 5 | 0.977 5 | 0.679 8 | 0.984 8 | 0.975 8 | 0.795 9 | 0.752 8 | 0.977 5 | 0.831 5 | 0.799 6 |
car | 0.648 1 | 0.703 1 | 0.915 5 | 0.987 1 | 0.876 8 | 0.952 8 | 0.921 6 | 0.923 6 | 0.909 4 | 0.950 1 |
page_blocks | 0.930 4 | 0.942 3 | 0.949 2 | 0.985 0 | 0.973 2 | 0.837 8 | 0.977 5 | 0.942 8 | 0.907 4 | 0.982 5 |
flare | 0.534 7 | 0.789 9 | 0.508 4 | 0.892 6 | 0.904 7 | 0.892 1 | 0.888 8 | 0.739 2 | 0.778 4 | 0.901 8 |
satimage | 0.929 4 | 0.948 3 | 0.763 6 | 0.991 8 | 0.963 7 | 0.962 4 | 0.950 0 | 0.917 9 | 0.958 3 | 0.970 8 |
数据集 | 分解方法 | 数据级方法 | 算法级方法 | 代价敏感方法 | 集成方法 | |||||
---|---|---|---|---|---|---|---|---|---|---|
A&O | OVO | OVA | DOVO | DECOC | OVO‑SMOTE | OVA‑SMOTE | OAHO | FuzzyImb | BBO | |
平均值 | 0.802 1 | 0.824 0 | 0.772 1 | 0.917 5 | 0.902 7 | 0.810 7 | 0.695 9 | 0.824 1 | 0.764 5 | 0.638 1 |
contraceptive | 0.730 7 | 0.738 1 | 0.720 1 | 0.638 0 | 0.632 0 | 0.665 4 | 0.524 8 | 0.742 7 | 0.722 9 | 0.455 2 |
balance | 0.580 1 | 0.564 6 | 0.677 1 | 0.879 1 | NaN | 0.740 0 | NaN | 0.637 7 | 0.502 6 | NaN |
newthyroid | 0.969 2 | 0.969 2 | 0.974 9 | 0.950 2 | 0.974 6 | 0.904 7 | 0.892 3 | 0.974 9 | 0.941 4 | 0.864 1 |
splice | 0.967 2 | 0.968 0 | 0.971 6 | 0.965 7 | 0.905 3 | 0.587 3 | 0.541 3 | 0.973 3 | 0.929 2 | 0.613 7 |
thyroid | 0.971 3 | 0.971 3 | 0.972 2 | 0.999 2 | 0.998 9 | 0.956 0 | 0.651 4 | 0.969 1 | 0.940 8 | 0.598 0 |
wine | 0.978 0 | 0.978 0 | 0.538 1 | 0.981 7 | 0.972 9 | 0.809 1 | 0.712 6 | 0.978 0 | 0.831 8 | 0.659 6 |
car | 0.570 8 | 0.602 9 | 0.892 5 | 0.968 0 | NaN | 0.788 0 | 0.700 9 | 0.915 7 | 0.602 6 | NaN |
page_blocks | 0.728 7 | 0.781 6 | 0.601 7 | NaN | 0.872 9 | NaN | 0.743 5 | 0.466 7 | 0.568 8 | NaN |
flare | 0.610 1 | 0.723 7 | 0.641 0 | 0.885 4 | NaN | 0.882 7 | 0.637 1 | 0.661 3 | 0.694 7 | NaN |
satimage | 0.914 9 | 0.942 1 | 0.731 3 | 0.990 1 | 0.962 4 | 0.962 7 | 0.859 3 | 0.921 8 | 0.910 1 | NaN |
表5 典型方法在实验数据集上的F1值
Tab. 5 F1 scores of classic methods on experimental datasets
数据集 | 分解方法 | 数据级方法 | 算法级方法 | 代价敏感方法 | 集成方法 | |||||
---|---|---|---|---|---|---|---|---|---|---|
A&O | OVO | OVA | DOVO | DECOC | OVO‑SMOTE | OVA‑SMOTE | OAHO | FuzzyImb | BBO | |
平均值 | 0.802 1 | 0.824 0 | 0.772 1 | 0.917 5 | 0.902 7 | 0.810 7 | 0.695 9 | 0.824 1 | 0.764 5 | 0.638 1 |
contraceptive | 0.730 7 | 0.738 1 | 0.720 1 | 0.638 0 | 0.632 0 | 0.665 4 | 0.524 8 | 0.742 7 | 0.722 9 | 0.455 2 |
balance | 0.580 1 | 0.564 6 | 0.677 1 | 0.879 1 | NaN | 0.740 0 | NaN | 0.637 7 | 0.502 6 | NaN |
newthyroid | 0.969 2 | 0.969 2 | 0.974 9 | 0.950 2 | 0.974 6 | 0.904 7 | 0.892 3 | 0.974 9 | 0.941 4 | 0.864 1 |
splice | 0.967 2 | 0.968 0 | 0.971 6 | 0.965 7 | 0.905 3 | 0.587 3 | 0.541 3 | 0.973 3 | 0.929 2 | 0.613 7 |
thyroid | 0.971 3 | 0.971 3 | 0.972 2 | 0.999 2 | 0.998 9 | 0.956 0 | 0.651 4 | 0.969 1 | 0.940 8 | 0.598 0 |
wine | 0.978 0 | 0.978 0 | 0.538 1 | 0.981 7 | 0.972 9 | 0.809 1 | 0.712 6 | 0.978 0 | 0.831 8 | 0.659 6 |
car | 0.570 8 | 0.602 9 | 0.892 5 | 0.968 0 | NaN | 0.788 0 | 0.700 9 | 0.915 7 | 0.602 6 | NaN |
page_blocks | 0.728 7 | 0.781 6 | 0.601 7 | NaN | 0.872 9 | NaN | 0.743 5 | 0.466 7 | 0.568 8 | NaN |
flare | 0.610 1 | 0.723 7 | 0.641 0 | 0.885 4 | NaN | 0.882 7 | 0.637 1 | 0.661 3 | 0.694 7 | NaN |
satimage | 0.914 9 | 0.942 1 | 0.731 3 | 0.990 1 | 0.962 4 | 0.962 7 | 0.859 3 | 0.921 8 | 0.910 1 | NaN |
数据集 | 分解方法 | 数据级方法 | 算法级方法 | 代价敏感方法 | 集成方法 | |||||
---|---|---|---|---|---|---|---|---|---|---|
A&O | OVO | OVA | DOVO | DECOC | OVO‑SMOTE | OVA‑SMOTE | OAHO | FuzzyImb | BBO | |
平均值 | 0.897 4 | 0.914 9 | 0.860 5 | 0.925 0 | 0.887 1 | 0.773 5 | 0.781 6 | 0.907 1 | 0.842 4 | 0.744 7 |
contraceptive | 0.799 9 | 0.800 3 | 0.790 5 | 0.652 4 | 0.639 3 | 0.584 2 | 0.583 9 | 0.810 1 | 0.702 4 | 0.600 2 |
balance | 0.770 7 | 0.748 5 | 0.834 0 | 0.924 9 | 0.932 9 | 0.701 2 | 0.760 7 | 0.804 7 | 0.731 8 | 0.762 5 |
newthyroid | 0.989 9 | 0.989 9 | 0.991 9 | 0.970 0 | 0.986 0 | 0.952 0 | 0.901 6 | 0.991 9 | 0.979 7 | 0.867 0 |
splice | 0.978 2 | 0.979 0 | 0.977 5 | 0.972 7 | 0.929 5 | 0.587 0 | 0.605 6 | 0.980 4 | 0.903 4 | 0.742 2 |
thyroid | 0.997 3 | 0.997 3 | 0.997 4 | 0.990 9 | 0.990 1 | 0.654 5 | 0.693 2 | 0.996 9 | 0.929 9 | 0.646 7 |
wine | 0.982 6 | 0.982 6 | 0.724 8 | 0.985 2 | 0.976 4 | 0.814 0 | 0.795 9 | 0.985 2 | 0.888 6 | 0.759 9 |
car | 0.836 7 | 0.856 8 | 0.949 0 | 0.977 9 | 0.672 6 | 0.859 5 | 0.828 4 | 0.958 5 | 0.817 0 | 0.775 4 |
page_blocks | 0.970 2 | 0.969 9 | 0.877 7 | 0.945 7 | 0.929 2 | NaN | 0.868 3 | 0.776 8 | 0.817 5 | 0.810 2 |
flare | 0.693 7 | 0.854 0 | 0.615 6 | 0.839 3 | 0.856 9 | 0.851 0 | 0.817 9 | 0.815 5 | 0.709 9 | 0.738 2 |
satimage | 0.954 3 | 0.971 1 | 0.846 6 | 0.991 0 | 0.957 8 | 0.958 4 | 0.960 2 | 0.950 7 | 0.943 6 | NaN |
表6 典型方法在实验数据集上的AUC值
Tab. 6 AUC values of classic methods on experimental datasets
数据集 | 分解方法 | 数据级方法 | 算法级方法 | 代价敏感方法 | 集成方法 | |||||
---|---|---|---|---|---|---|---|---|---|---|
A&O | OVO | OVA | DOVO | DECOC | OVO‑SMOTE | OVA‑SMOTE | OAHO | FuzzyImb | BBO | |
平均值 | 0.897 4 | 0.914 9 | 0.860 5 | 0.925 0 | 0.887 1 | 0.773 5 | 0.781 6 | 0.907 1 | 0.842 4 | 0.744 7 |
contraceptive | 0.799 9 | 0.800 3 | 0.790 5 | 0.652 4 | 0.639 3 | 0.584 2 | 0.583 9 | 0.810 1 | 0.702 4 | 0.600 2 |
balance | 0.770 7 | 0.748 5 | 0.834 0 | 0.924 9 | 0.932 9 | 0.701 2 | 0.760 7 | 0.804 7 | 0.731 8 | 0.762 5 |
newthyroid | 0.989 9 | 0.989 9 | 0.991 9 | 0.970 0 | 0.986 0 | 0.952 0 | 0.901 6 | 0.991 9 | 0.979 7 | 0.867 0 |
splice | 0.978 2 | 0.979 0 | 0.977 5 | 0.972 7 | 0.929 5 | 0.587 0 | 0.605 6 | 0.980 4 | 0.903 4 | 0.742 2 |
thyroid | 0.997 3 | 0.997 3 | 0.997 4 | 0.990 9 | 0.990 1 | 0.654 5 | 0.693 2 | 0.996 9 | 0.929 9 | 0.646 7 |
wine | 0.982 6 | 0.982 6 | 0.724 8 | 0.985 2 | 0.976 4 | 0.814 0 | 0.795 9 | 0.985 2 | 0.888 6 | 0.759 9 |
car | 0.836 7 | 0.856 8 | 0.949 0 | 0.977 9 | 0.672 6 | 0.859 5 | 0.828 4 | 0.958 5 | 0.817 0 | 0.775 4 |
page_blocks | 0.970 2 | 0.969 9 | 0.877 7 | 0.945 7 | 0.929 2 | NaN | 0.868 3 | 0.776 8 | 0.817 5 | 0.810 2 |
flare | 0.693 7 | 0.854 0 | 0.615 6 | 0.839 3 | 0.856 9 | 0.851 0 | 0.817 9 | 0.815 5 | 0.709 9 | 0.738 2 |
satimage | 0.954 3 | 0.971 1 | 0.846 6 | 0.991 0 | 0.957 8 | 0.958 4 | 0.960 2 | 0.950 7 | 0.943 6 | NaN |
数据集 | 分解方法 | 数据级方法 | 算法级方法 | 代价敏感方法 | 集成方法 | |||||
---|---|---|---|---|---|---|---|---|---|---|
A&O | OVO | OVA | DOVO | DECOC | OVO‑SMOTE | OVA‑SMOTE | OAHO | FuzzyImb | BBO | |
平均值 | 0.797 9 | 0.807 6 | 0.781 9 | 0.916 5 | 0.856 2 | 0.731 4 | 0.779 7 | 0.813 4 | 0.744 7 | 0.674 2 |
contraceptive | 0.615 0 | 0.614 1 | 0.596 1 | 0.635 5 | 0.626 1 | 0.427 1 | 0.510 3 | 0.638 7 | 0.607 2 | 0.547 1 |
balance | 0.551 3 | 0.514 7 | 0.653 3 | 0.920 3 | 0.928 0 | 0.529 4 | 0.724 2 | 0.592 1 | 0.465 2 | 0.598 6 |
newthyroid | 0.983 2 | 0.983 2 | 0.986 6 | 0.968 2 | 0.985 4 | 0.949 9 | 0.890 0 | 0.986 6 | 0.966 1 | 0.851 6 |
splice | 0.957 7 | 0.958 9 | 0.954 8 | 0.972 7 | 0.929 3 | 0.891 7 | 0.859 1 | 0.960 5 | 0.904 5 | 0.712 8 |
thyroid | 0.994 7 | 0.994 7 | 0.994 7 | 0.990 7 | 0.990 0 | 0.467 1 | 0.606 5 | 0.994 1 | 0.930 9 | 0.514 7 |
wine | 0.965 7 | 0.965 7 | 0.925 4 | 0.984 9 | 0.975 9 | 0.790 8 | 0.779 1 | 0.971 4 | 0.759 9 | 0.738 7 |
car | 0.563 5 | 0.606 8 | 0.860 2 | 0.977 1 | 0.447 6 | 0.831 3 | 0.810 5 | 0.889 5 | 0.642 0 | 0.702 0 |
page_blocks | 0.895 4 | 0.893 8 | 0.892 9 | 0.940 0 | 0.918 1 | 0.682 1 | 0.854 0 | 0.637 2 | 0.811 4 | 0.762 2 |
flare | 0.673 4 | 0.682 4 | 0.605 8 | 0.784 3 | 0.804 1 | 0.789 1 | 0.803 5 | 0.696 6 | 0.721 9 | 0.640 8 |
satimage | 0.779 0 | 0.861 9 | 0.349 5 | 0.991 0 | 0.957 1 | 0.955 7 | 0.959 9 | 0.767 2 | 0.637 9 | 0.673 9 |
表7 典型方法在实验数据集上的G?mean值
Tab. 7 G?mean values of classic methods on experimental datasets
数据集 | 分解方法 | 数据级方法 | 算法级方法 | 代价敏感方法 | 集成方法 | |||||
---|---|---|---|---|---|---|---|---|---|---|
A&O | OVO | OVA | DOVO | DECOC | OVO‑SMOTE | OVA‑SMOTE | OAHO | FuzzyImb | BBO | |
平均值 | 0.797 9 | 0.807 6 | 0.781 9 | 0.916 5 | 0.856 2 | 0.731 4 | 0.779 7 | 0.813 4 | 0.744 7 | 0.674 2 |
contraceptive | 0.615 0 | 0.614 1 | 0.596 1 | 0.635 5 | 0.626 1 | 0.427 1 | 0.510 3 | 0.638 7 | 0.607 2 | 0.547 1 |
balance | 0.551 3 | 0.514 7 | 0.653 3 | 0.920 3 | 0.928 0 | 0.529 4 | 0.724 2 | 0.592 1 | 0.465 2 | 0.598 6 |
newthyroid | 0.983 2 | 0.983 2 | 0.986 6 | 0.968 2 | 0.985 4 | 0.949 9 | 0.890 0 | 0.986 6 | 0.966 1 | 0.851 6 |
splice | 0.957 7 | 0.958 9 | 0.954 8 | 0.972 7 | 0.929 3 | 0.891 7 | 0.859 1 | 0.960 5 | 0.904 5 | 0.712 8 |
thyroid | 0.994 7 | 0.994 7 | 0.994 7 | 0.990 7 | 0.990 0 | 0.467 1 | 0.606 5 | 0.994 1 | 0.930 9 | 0.514 7 |
wine | 0.965 7 | 0.965 7 | 0.925 4 | 0.984 9 | 0.975 9 | 0.790 8 | 0.779 1 | 0.971 4 | 0.759 9 | 0.738 7 |
car | 0.563 5 | 0.606 8 | 0.860 2 | 0.977 1 | 0.447 6 | 0.831 3 | 0.810 5 | 0.889 5 | 0.642 0 | 0.702 0 |
page_blocks | 0.895 4 | 0.893 8 | 0.892 9 | 0.940 0 | 0.918 1 | 0.682 1 | 0.854 0 | 0.637 2 | 0.811 4 | 0.762 2 |
flare | 0.673 4 | 0.682 4 | 0.605 8 | 0.784 3 | 0.804 1 | 0.789 1 | 0.803 5 | 0.696 6 | 0.721 9 | 0.640 8 |
satimage | 0.779 0 | 0.861 9 | 0.349 5 | 0.991 0 | 0.957 1 | 0.955 7 | 0.959 9 | 0.767 2 | 0.637 9 | 0.673 9 |
数据集 | 分解方法 | 数据级方法 | 算法级方法 | 代价敏感方法 | 集成方法 | |||||
---|---|---|---|---|---|---|---|---|---|---|
A&O | OVO | OVA | DOVO | DECOC | OVO‑SMOTE | OVA‑SMOTE | OAHO | FuzzyImb | BBO | |
平均值 | 0.734 8 | 0.778 8 | 0.724 9 | 0.842 5 | 0.763 3 | 0.540 8 | 0.518 9 | 0.818 3 | 0.425 3 | 0.493 6 |
contraceptive | 0.604 6 | 0.616 0 | 0.588 1 | 0.307 9 | 0.282 2 | 0.145 9 | 0.125 9 | 0.623 7 | 0.563 2 | 0.211 2 |
balance | 0.457 6 | 0.437 8 | 0.630 8 | 0.769 9 | 0.771 9 | 0.356 8 | 0.510 7 | 0.573 2 | 0.347 4 | 0.521 8 |
newthyroid | 0.951 6 | 0.951 6 | 0.961 0 | 0.940 5 | 0.968 7 | 0.872 2 | 0.799 4 | 0.961 0 | 0.905 6 | 0.805 7 |
splice | 0.950 3 | 0.951 9 | 0.957 6 | 0.944 2 | 0.848 6 | 0.151 0 | 0.147 7 | 0.960 8 | 0.536 1 | 0.388 8 |
thyroid | 0.964 3 | 0.964 3 | 0.965 3 | 0.975 7 | 0.970 5 | 0.388 3 | 0.391 0 | 0.956 9 | 0.022 0 | 0.398 4 |
wine | 0.965 8 | 0.965 8 | 0.489 2 | 0.967 7 | 0.949 2 | 0.607 1 | 0.515 0 | 0.966 0 | 0.749 5 | 0.516 9 |
car | 0.407 5 | 0.477 7 | 0.825 9 | 0.957 8 | 0.349 5 | 0.666 0 | 0.621 1 | 0.842 5 | -0.063 6 | 0.614 1 |
page_blocks | 0.716 3 | 0.752 0 | 0.764 1 | 0.891 1 | 0.850 9 | 0.610 3 | 0.675 3 | 0.735 1 | 0.604 3 | 0.674 8 |
flare | 0.416 3 | 0.734 3 | 0.359 2 | 0.687 3 | 0.724 2 | 0.693 7 | 0.572 0 | 0.665 3 | 0.386 1 | 0.477 2 |
satimage | 0.913 2 | 0.936 4 | 0.707 3 | 0.982 6 | 0.917 6 | 0.916 7 | 0.830 4 | 0.898 5 | 0.202 3 | 0.327 1 |
表8 典型方法在实验数据集上的Kappa值
Tab. 8 Kappa values of classic algorithms on experimental datasets
数据集 | 分解方法 | 数据级方法 | 算法级方法 | 代价敏感方法 | 集成方法 | |||||
---|---|---|---|---|---|---|---|---|---|---|
A&O | OVO | OVA | DOVO | DECOC | OVO‑SMOTE | OVA‑SMOTE | OAHO | FuzzyImb | BBO | |
平均值 | 0.734 8 | 0.778 8 | 0.724 9 | 0.842 5 | 0.763 3 | 0.540 8 | 0.518 9 | 0.818 3 | 0.425 3 | 0.493 6 |
contraceptive | 0.604 6 | 0.616 0 | 0.588 1 | 0.307 9 | 0.282 2 | 0.145 9 | 0.125 9 | 0.623 7 | 0.563 2 | 0.211 2 |
balance | 0.457 6 | 0.437 8 | 0.630 8 | 0.769 9 | 0.771 9 | 0.356 8 | 0.510 7 | 0.573 2 | 0.347 4 | 0.521 8 |
newthyroid | 0.951 6 | 0.951 6 | 0.961 0 | 0.940 5 | 0.968 7 | 0.872 2 | 0.799 4 | 0.961 0 | 0.905 6 | 0.805 7 |
splice | 0.950 3 | 0.951 9 | 0.957 6 | 0.944 2 | 0.848 6 | 0.151 0 | 0.147 7 | 0.960 8 | 0.536 1 | 0.388 8 |
thyroid | 0.964 3 | 0.964 3 | 0.965 3 | 0.975 7 | 0.970 5 | 0.388 3 | 0.391 0 | 0.956 9 | 0.022 0 | 0.398 4 |
wine | 0.965 8 | 0.965 8 | 0.489 2 | 0.967 7 | 0.949 2 | 0.607 1 | 0.515 0 | 0.966 0 | 0.749 5 | 0.516 9 |
car | 0.407 5 | 0.477 7 | 0.825 9 | 0.957 8 | 0.349 5 | 0.666 0 | 0.621 1 | 0.842 5 | -0.063 6 | 0.614 1 |
page_blocks | 0.716 3 | 0.752 0 | 0.764 1 | 0.891 1 | 0.850 9 | 0.610 3 | 0.675 3 | 0.735 1 | 0.604 3 | 0.674 8 |
flare | 0.416 3 | 0.734 3 | 0.359 2 | 0.687 3 | 0.724 2 | 0.693 7 | 0.572 0 | 0.665 3 | 0.386 1 | 0.477 2 |
satimage | 0.913 2 | 0.936 4 | 0.707 3 | 0.982 6 | 0.917 6 | 0.916 7 | 0.830 4 | 0.898 5 | 0.202 3 | 0.327 1 |
1 | SHILASKAR S, GHATOL A. Diagnosis system for imbalanced multi‑minority medical dataset[J]. Soft Computing, 2019, 23(13): 4789-4799. 10.1007/s00500-018-3133-x |
2 | LANGO M. Tackling the problem of class imbalance in multi‑class sentiment classification: an experimental study[J]. Foundations of Computing and Decision Sciences, 2019, 44(2): 151-178. 10.2478/fcds-2019-0009 |
3 | KRAWCZYK B, McINNES B T, CANO A. Sentiment classification from multi‑class imbalanced twitter data using binarization[C]// Proceedings of the 2017 International Conference on Hybrid Artificial Intelligence Systems, LNCS 10334. Cham: Springer, 2017: 26-37. |
4 | KULKARNI R, VINTRÓ M, KAPETANAKIS S, et al. Performance comparison of popular text vectorising models on multi‑class email classification[C]// Proceedings of the 2018 SAI Intelligent Systems Conference, AISC 868. Cham: Springer, 2019: 567-578. |
5 | DORADO‑MORENO M, GUTIÉRREZ P A, CORNEJO‑BUENO L, et al. Ordinal multi‑class architecture for predicting wind power ramp events based on reservoir computing[J]. Neural Processing Letters, 2020, 52(1): 57-74. 10.1007/s11063-018-9922-5 |
6 | YUAN Y L, HUO L W, HOGREFE D. Two layers multi‑class detection method for network intrusion detection system[C]// Proceedings of the 2017 IEEE Symposium on Computers and Communications. Piscataway: IEEE, 2017: 767-772. 10.1109/iscc.2017.8024620 |
7 | BENCHAJI I, DOUZI S, OUAHIDI B EL. Using genetic algorithm to improve classification of imbalanced datasets for credit card fraud detection[C]// Proceedings of the 2019 International Conference on Advanced Information Technology, Services and Systems, LNNS 66. Cham: Springer, 2019: 220-229. |
8 | 李艳霞,柴毅,胡友强,等. 不平衡数据分类方法综述[J]. 控制与决策, 2019, 34(4): 673-688. 10.13195/j.kzyjc.2018.0865 |
LI Y X, CHAI Y, HU Y Q, et al. Review of imbalanced data classification methods[J]. Control and Decision, 2019, 34(4): 673-688. 10.13195/j.kzyjc.2018.0865 | |
9 | SAHARE M, GUPTA H. A review of multi‑class classification for imbalanced data[J]. International Journal of Advanced Computer Research, 2012, 2(5): 160-164. |
10 | TANHA J, ABDI Y, SAMADI N, et al. Boosting methods for multi‑class imbalanced data classification: an experimental review[J]. Journal of Big Data, 2020, 7: No.70. 10.1186/s40537-020-00349-y |
11 | KAUR H, PANNU H S, MALHI A K. A systematic review on imbalanced data challenges in machine learning[J]. ACM Computing Surveys, 2019, 52(4): No.79. 10.1145/3343440 |
12 | KRAWCZYK B, KOZIARSKI M, WOŹNIAK M. Radial‑based oversampling for multiclass imbalanced data classification[J]. IEEE Transactions on Neural Networks and Learning Systems, 2020, 31(8): 2818-2831. 10.1109/tnnls.2019.2913673 |
13 | ZHANG Z L, KRAWCZYK B, GARCÌA S, et al. Empowering one‑vs‑one decomposition with ensemble learning for multi‑class imbalanced data[J]. Knowledge‑Based Systems, 2016, 106: 251-263. 10.1016/j.knosys.2016.05.048 |
14 | RODRÍGUEZ J J, DÍEZ‑PASTOR J F, ARNAIZ‑GONZÁLEZ Á, et al. Random balance ensembles for multiclass imbalance learning[J]. Knowledge‑Based Systems, 2020, 193: No.105434. 10.1016/j.knosys.2019.105434 |
15 | ŻAK M, WOŹNIAK M. Performance analysis of binarization strategies for multi‑class imbalanced data classification[C]// Proceedings of the 2020 International Conference on Computational Science, LNCS 12140. Cham: Springer, 2020: 141-155. |
16 | ZHANG Z L, LUO X G, GONZÁLEZ S, et al. DRCW‑ASEG: One‑versus‑one distance‑based relative competence weighting with adaptive synthetic example generation for multi‑class imbalanced datasets[J]. Neurocomputing, 2018, 285: 176-187. 10.1016/j.neucom.2018.01.039 |
17 | LIANG L J, JIN T T, HUO M Y. Feature identification from imbalanced data sets for diagnosis of cardiac arrhythmia[C]// Proceedings of the 11th International Symposium on Computational Intelligence and Design. Piscataway: IEEE, 2018: 52-55. 10.1109/iscid.2018.10113 |
18 | CHAWLA N V, BOWYER K W, HALL L O, et al. SMOTE: synthetic minority over‑sampling technique[J]. Journal of Artificial Intelligence Research, 2002, 16: 321-357. 10.1613/jair.953 |
19 | LIU X Y, WU J X, ZHOU Z H. Exploratory undersampling for class‑imbalance learning[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2009, 39(2): 539-550. 10.1109/tsmcb.2008.2007853 |
20 | BARANDELA R, VALDOVINOS R M, SÁNCHEZ J S. New applications of ensembles of classifiers[J]. Pattern Analysis and Applications, 2003, 6(3): 245-256. 10.1007/s10044-003-0192-z |
21 | WANG S, YAO X. Diversity analysis on imbalanced data sets by using ensemble models[C]// Proceedings of the 2009 IEEE Symposium on Computational Intelligence and Data Mining. Piscataway: IEEE, 2009: 324-331. 10.1109/cidm.2009.4938667 |
22 | SEIFFERT C, KHOSHGOFTAAR T M, van HULSE J, et al. RUSBoost: a hybrid approach to alleviating class imbalance[J]. IEEE Transactions on Systems, Man, and Cybernetics — Part A: Systems and Humans, 2010, 40(1): 185-197. 10.1109/tsmca.2009.2029559 |
23 | CHAWLA N V, LAZAREVIC A, HALL L O, et al. SMOTEBoost: improving prediction of the minority class in boosting[C]// Proceedings of the 2003 European Conference on Principles of Data Mining and Knowledge Discovery, LNCS 2838. Berlin: Springer, 2003: 107-119. |
24 | JEGIERSKI H, SAGANOWSKI S. An “outside the box” solution for imbalanced data classification[J]. IEEE Access, 2020, 8: 125191-125209. 10.1109/access.2020.3007801 |
25 | SEN A, ISLAM M M, MURASE K, et al. Binarization with boosting and oversampling for multiclass classification[J]. IEEE Transactions on Cybernetics, 2016, 46(5): 1078-1091. 10.1109/tcyb.2015.2423295 |
26 | JIANG C Q, LIU Y, DING Y, et al. Capturing helpful reviews from social media for product quality improvement: a multi‑class classification approach[J]. International Journal of Production Research, 2017, 55(12): 3528-3541. 10.1080/00207543.2017.1304664 |
27 | SÁEZ J A, GALAR M, LUENGO J, et al. Analyzing the presence of noise in multi‑class problems: alleviating its influence with the One‑vs‑One decomposition[J]. Knowledge and Information Systems, 2014, 38(1): 179-206. 10.1007/s10115-012-0570-1 |
28 | MURPHEY Y L, WANG H X, OU G B, et al. OAHO: an effective algorithm for multi‑class learning from imbalanced data[C]// Proceedings of the 2007 International Joint Conference on Neural Networks. Piscataway: IEEE, 2007: 406-411. 10.1109/ijcnn.2007.4370991 |
29 | HAN H, WANG W Y, MAO B H. Borderline‑SMOTE: a new over‑sampling method in imbalanced data sets learning[C]// Proceedings of the 2005 International Conference on Intelligent Computing, LNCS 3644. Berlin: Springer, 2005: 878-887. |
30 | HE H B, BAI Y, GARCIA E A, et al. ADASYN: adaptive synthetic sampling approach for imbalanced learning[C]// Proceedings of the 2008 IEEE International Joint Conference on Neural Network (IEEE World Congress on Computational Intelligence). Piscataway: IEEE, 2008: 1322-1328. 10.1109/ijcnn.2008.4633969 |
31 | GALAR M, FERNÁNDEZ A, BARRENECHEA E, et al. DRCW‑OVO: distance‑based relative competence weighting combination for One‑vs‑One strategy in multi‑class problems[J]. Pattern Recognition, 2015, 48(1): 28-42. 10.1016/j.patcog.2014.07.023 |
32 | ZHANG J H, CUI X Q, LI J R, et al. Imbalanced classification of mental workload using a cost‑sensitive majority weighted minority oversampling strategy[J]. Cognition, Technology and Work, 2017, 19(4): 633-653. 10.1007/s10111-017-0447-x |
33 | PATIL S S, SONAVANE S P. Enriched over_sampling techniques for improving classification of imbalanced big data[C]// Proceedings of the IEEE 3rd International Conference on Big Data Computing Service and Applications. Piscataway: IEEE, 2017: 1-10. 10.1109/bigdataservice.2017.19 |
34 | RIVERA W, ASPAROUHOV O. Safe level OUPS for improving target concept learning in imbalanced data sets[C]// Proceedings of the 2015 IEEE SoutheastCon. Piscataway: IEEE, 2015: 1-8. 10.1109/secon.2015.7132940 |
35 | MATHEW J, PANG C K, LUO M, et al. Classification of imbalanced data by oversampling in kernel space of support vector machines[J]. IEEE Transactions on Neural Networks and Learning Systems, 2018, 29(9): 4065-4076. 10.1109/tnnls.2017.2751612 |
36 | ZAREAPOOR M, SHAMSOLMOALI P, YANG J. Oversampling adversarial network for class‑imbalanced fault diagnosis[J]. Mechanical Systems and Signal Processing, 2021, 149: No.107175. 10.1016/j.ymssp.2020.107175 |
37 | XIA M, LI T, XU L, et al. Fault diagnosis for rotating machinery using multiple sensors and convolutional neural networks[J]. IEEE‑ASME Transactions on Mechatronics, 2018, 23(1): 101-110. 10.1109/tmech.2017.2728371 |
38 | LIU H, ZHOU J Z, XU Y H, et al. Unsupervised fault diagnosis of rolling bearings using a deep neural network based on generative adversarial networks[J]. Neurocomputing, 2018, 315: 412-424. 10.1016/j.neucom.2018.07.034 |
39 | YU H Y, CHEN C Y, YANG H M. Two‑stage game strategy for multiclass imbalanced data online prediction[J]. Neural Processing Letters, 2020, 52(3): 2493-2512. 10.1007/s11063-020-10358-w |
40 | LEE J, PARK K. GAN‑based imbalanced data intrusion detection system[J]. Personal and Ubiquitous Computing, 2021, 25(1): 121-128. 10.1007/s00779-019-01332-y |
41 | SHAMSOLMOALI P, ZAREAPOOR M, SHEN L L, et al. Imbalanced data learning by minority class augmentation using capsule adversarial networks[J]. Neurocomputing, 2020, 459: 481-493. 10.1016/j.neucom.2020.01.119 |
42 | POUYANFAR S, CHEN S C, SHYU M L. Deep spatio‑temporal representation learning for multi‑class imbalanced data classification[C]// Proceedings of the 2018 IEEE International Conference on Information Reuse and Integration. Piscataway: IEEE, 2018: 386-393. 10.1109/iri.2018.00064 |
43 | LIU Q J, MA G J, CHENG C. Data fusion generative adversarial network for multi‑class imbalanced fault diagnosis of rotating machinery[J]. IEEE Access, 2020, 8: 70111-70124. 10.1109/access.2020.2986356 |
44 | YANG X B, KUANG Q M, ZHANG W S, et al. AMDO: an over‑sampling technique for multi‑class imbalanced problems[J]. IEEE Transactions on Knowledge and Data Engineering, 2018, 30(9): 1672-1685. 10.1109/tkde.2017.2761347 |
45 | ABDI L, HASHEMI S. To combat multi‑class imbalanced problems by means of over‑sampling techniques[J]. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(1): 238-251. 10.1109/tkde.2015.2458858 |
46 | LI Q M, SONG Y J, ZHANG J, et al. Multiclass imbalanced learning with one‑versus‑one decomposition and spectral clustering[J]. Expert Systems with Applications, 2020, 147: No.113152. 10.1016/j.eswa.2019.113152 |
47 | CHEN X T, ZHANG L, WEI X H, et al. An effective method using clustering‑based adaptive decomposition and editing‑based diversified oversamping for multi‑class imbalanced datasets[J]. Applied Intelligence, 2021, 51(4): 1918-1933. 10.1007/s10489-020-01883-1 |
48 | SANTOSO B, WIJAYANTO H, NOTODIPUTRO K A, et al. K‑Neighbor over‑sampling with cleaning data: a new approach to improve classification performance in data sets with class imbalance[J]. Applied Mathematical Sciences, 2018, 12(10): 449-460. 10.12988/ams.2018.8231 |
49 | KOZIARSKI M, WOŹNIAK M, KRAWCZYK B. Combined cleaning and resampling algorithm for multi‑class imbalanced data with label noise[J]. Knowledge‑Based Systems, 2020, 204: No.106223. 10.1016/j.knosys.2020.106223 |
50 | WU Q, LIN Y P, ZHU T F, et al. HUSBoost: a hubness‑aware boosting for high‑dimensional imbalanced data classification[C]// Proceedings of the 2019 International Conference on Machine Learning and Data Engineering. Piscataway: IEEE, 2019: 36-41. 10.1109/icmlde49015.2019.00018 |
51 | RAYHAN F, AHMED S, MAHBUB A, et al. CUSBoost: cluster‑ based under‑sampling with boosting for imbalanced classification[C]// Proceedings of the 2nd International Conference on Computational Systems and Information Technology for Sustainable Solution. Piscataway: IEEE, 2017: 1-5. 10.1109/csitss.2017.8447534 |
52 | LI Y, WANG J, WANG S G, et al. Local dense mixed region cutting + global rebalancing: a method for imbalanced text sentiment classification[J]. International Journal of Machine Learning and Cybernetics, 2019, 10(7): 1805-1820. 10.1007/s13042-018-0858-x |
53 | LI L S, HE H B, LI J. Entropy‑based sampling approaches for multi‑class imbalanced problems[J]. IEEE Transactions on Knowledge and Data Engineering, 2020, 32(11): 2159-2170. 10.1109/tkde.2019.2913859 |
54 | GALAR M, FERNÁNDEZ A, BARRENECHEA E, et al. EUSBoost: enhancing ensembles for highly imbalanced data‑sets by evolutionary undersampling[J]. Pattern Recognition, 2013, 46(12): 3460-3471. 10.1016/j.patcog.2013.05.006 |
55 | GARCÍA S, HERRERA F. Evolutionary undersampling for classification with imbalanced datasets: proposals and taxonomy[J]. Evolutionary Computation, 2009, 17(3): 275-306. 10.1162/evco.2009.17.3.275 |
56 | FERNANDES E R Q, DE CARVALHO A C P L F. Evolutionary inversion of class distribution in overlapping areas for multi‑class imbalanced learning[J]. Information Sciences, 2019, 494: 141-154. 10.1016/j.ins.2019.04.052 |
57 | DEB K, PRATAP A, AGARWAL S, et al. A fast and elitist multiobjective genetic algorithm: NSGA‑Ⅱ[J]. IEEE Transactions on Evolutionary Computation, 2002, 6(2): 182-197. 10.1109/4235.996017 |
58 | GOLDBERG D E. Genetic Algorithms in Search, Optimization, and Machine Learning[M]. Boston: Addison‑Wesley Professional, 1989: 95-99. 10.5860/choice.27-0936 |
59 | LIU Z, TANG D Y, CAI Y M, et al. A hybrid method based on ensemble WELM for handling multi class imbalance in cancer microarray data[J]. Neurocomputing, 2017, 266: 641-650. 10.1016/j.neucom.2017.05.066 |
60 | SARIKAYA A, KILIÇ B G. A class‑specific intrusion detection model: hierarchical multi‑class IDS model[J]. SN Computer Science, 2020, 1(4): No.202. 10.1007/s42979-020-00213-z |
61 | LI J T, WANG Y Y, SONG X K, et al. Adaptive multinomial regression with overlapping groups for multi‑class classification of lung cancer[J]. Computers in Biology and Medicine, 2018, 100: 1-9. 10.1016/j.compbiomed.2018.06.014 |
62 | DUFRENOIS F. A one‑class kernel fisher criterion for outlier detection[J]. IEEE Transactions on Neural Networks and Learning Systems, 2015, 26(5): 982-994. 10.1109/tnnls.2014.2329534 |
63 | BELLINGER C, SHARMA S, JAPKOWICZ N. One‑class versus binary classification: which and when?[C]// Proceedings of the 11th International Conference on Machine Learning and Applications. Piscataway: IEEE, 2012: 102-106. 10.1109/icmla.2012.212 |
64 | HEMPSTALK K, FRANK E. Discriminating against new classes: one‑class versus multi‑class classification[C]// Proceedings of the 2008 Australasian Joint Conference on Artificial Intelligence, LNCS 5360. Berlin: Springer, 2008: 325-336. |
65 | KRAWCZYK B, WOŹNIAK M, HERRERA F. On the usefulness of one‑class classifier ensembles for decomposition of multi‑class problems[J]. Pattern Recognition, 2015, 48(12): 3969-3982. 10.1016/j.patcog.2015.06.001 |
66 | PÉREZ‑SÁNCHEZ B, FONTENLA‑ROMERO O, SÁNCHEZ‑ MAROÑO N. Selecting target concept in one‑class classification for handling class imbalance problem[C]// Proceedings of the 2015 International Joint Conference on Neural Networks. Piscataway: IEEE, 2015: 1-8. 10.1109/ijcnn.2015.7280661 |
67 | KRAWCZYK B, GALAR M, WOŹNIAK M, et al. Dynamic ensemble selection for multi‑class classification with one‑class classifiers[J]. Pattern Recognition, 2018, 83: 34-51. 10.1016/j.patcog.2018.05.015 |
68 | GAO L, ZHANG L, LIU C, et al. Handling imbalanced medical image data: a deep‑learning‑based one‑class classification approach[J]. Artificial Intelligence in Medicine, 2020, 108: No.101935. 10.1016/j.artmed.2020.101935 |
69 | 万建武,杨明. 代价敏感学习方法综述[J]. 软件学报, 2020, 31(1): 113-136. 10.13328/j.cnki.jos.005871 |
WAN J W, YANG M. Survey on cost‑sensitive learning method[J]. Journal of Software, 2020, 31(1): 113-136. 10.13328/j.cnki.jos.005871 | |
70 | ZHANG Z L, LUO X G, GARCÍA S, et al. Cost‑sensitive back‑ propagation neural networks with binarization techniques in addressing multi‑class problems and non‑competent classifiers[J]. Applied Soft Computing, 2017, 56: 357-367. 10.1016/j.asoc.2017.03.016 |
71 | LING C X, SHENG V S. Cost‑sensitive learning and the class imbalance problem[M]// Encyclopedia of Machine Learning. Boston: Springer, 2010: 171, 231-235. 10.4018/978-1-60566-010-3.ch054 |
72 | DOMINGOS P. MetaCost: a general method for making classifiers cost‑sensitive[C]// Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 1999: 155-164. 10.1145/312129.312220 |
73 | IRANMEHR A, MASNADI‑SHIRAZI H, VASCONCELOS N. Cost‑sensitive support vector machines[J]. Neurocomputing, 2019, 343: 50-64. 10.1016/j.neucom.2018.11.099 |
74 | GU B, SHENG V S, TAY K Y, et al. Cross validation through two‑dimensional solution surface for cost‑sensitive SVM[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1103-1121. 10.1109/tpami.2016.2578326 |
75 | ZHANG C, TAN K C, LI H Z, et al. A cost‑sensitive deep belief network for imbalanced classification[J]. IEEE Transactions on Neural Networks and Learning Systems, 2019, 30(1): 109-122. 10.1109/tnnls.2018.2832648 |
76 | LANGO M, STEFANOWSKI J. Multi‑class and feature selection extensions of roughly balanced bagging for imbalanced data[J]. Journal of Intelligent Information Systems, 2018, 50(1): 97-127. 10.1007/s10844-017-0446-7 |
77 | HIDO S, KASHIMA H, TAKAHASHI Y. Roughly balanced bagging for imbalanced data[J]. Statistical Analysis and Data Mining, 2009, 2(5/6): 412-426. 10.1002/sam.10061 |
78 | TAHERKHANI A, COSMA G, McGINNITY T M. AdaBoost‑ CNN: an adaptive boosting algorithm for convolutional neural networks to classify multi‑class imbalanced datasets using transfer learning[J]. Neurocomputing, 2020, 404: 351-366. 10.1016/j.neucom.2020.03.064 |
79 | DÍEZ‑PASTOR J F, RODRÍGUEZ J J, GARCÍA‑OSORIO C, et al. Random Balance: ensembles of variable priors classifiers for imbalanced data[J]. Knowledge‑Based Systems, 2015, 85: 96-111. 10.1016/j.knosys.2015.04.022 |
80 | FERNÁNDEZ‑BALDERA A, BUENAPOSADA J M, BAUMELA L. BAdaCost: multi‑class Boosting with costs[J]. Pattern Recognition, 2018, 79: 467-479. 10.1016/j.patcog.2018.02.022 |
81 | SCHWENKER F. Ensemble methods: foundations and algorithms [Book Review][J]. IEEE Computational Intelligence Magazine, 2013, 8(1): 77-79. 10.1109/mci.2012.2228600 |
82 | JOHNSON J M, KHOSHGOFTAAR T M. Survey on deep learning with class imbalance[J]. Journal of Big Data, 2019, 6: No.27. 10.1186/s40537-019-0192-5 |
83 | RENDÓN E, ALEJO R, CASTORENA C, et al. Data sampling methods to deal with the big data multi‑class imbalance problem[J]. Applied Sciences, 2020, 10(4): No.1276. 10.3390/app10041276 |
84 | WILSON D L. Asymptotic properties of nearest neighbor rules using edited data[J]. IEEE Transactions on Systems, Man and Cybernetics, 1972, SMC‑2(3): 408-421. 10.1109/tsmc.1972.4309137 |
85 | TOMEK I. Two modifications of CNN[J]. IEEE Transactions on Systems, Man and Cybernetics, 1976, SMC‑6(11): 769-772. 10.1109/tsmc.1976.4309452 |
86 | RAGHUWANSHI B S, SHUKLA S. Generalized class‑specific kernelized extreme learning machine for multiclass imbalanced learning[J]. Expert Systems with Applications, 2019, 121: 244-255. 10.1016/j.eswa.2018.12.024 |
87 | RAGHUWANSHI B S, SHUKLA S. Class‑specific kernelized extreme learning machine for binary class imbalance learning[J]. Applied Soft Computing, 2018, 73: 1026-1038. 10.1016/j.asoc.2018.10.011 |
88 | MOSLEY L S D. A balanced approach to the multi‑class imbalance problem[D]. Ames, IA: Iowa State University, 2013: 15-25. |
89 | SOKOLOVA M, LAPALME G. A systematic analysis of performance measures for classification tasks[J]. Information Processing and Management, 2009, 45(4): 427-437. 10.1016/j.ipm.2009.03.002 |
90 | MORTAZ E. Imbalance accuracy metric for model selection in multi‑class imbalance classification problems[J]. Knowledge‑ Based Systems, 2020, 210: No.106490. 10.1016/j.knosys.2020.106490 |
91 | VIERA A J, GARRETT J M. Understanding interobserver agreement: the kappa statistic[J]. Family Medicine, 2005, 37(5): 360-363. |
92 | WEI J M, YUAN X J, HU Q H, et al. A novel measure for evaluating classifiers[J]. Expert Systems with Applications, 2010, 37(5): 3799-3809. 10.1016/j.eswa.2009.11.040 |
93 | BRANCO P, TORGO L, RIBEIRO R P. Relevance‑based evaluation metrics for multi‑class imbalanced domains[C]// Proceedings of the 2017 Pacific‑Asia Conference on Knowledge Discovery and Data Mining, LNCS 10234. Cham: Springer, 2017: 698-710. |
94 | GORODKIN J. Comparing two K‑category assignments by a K‑category correlation coefficient[J]. Computational Biology and Chemistry, 2004, 28(5/6): 367-374. 10.1016/j.compbiolchem.2004.09.006 |
95 | MATTHEWS B W. Comparison of the predicted and observed secondary structure of T4 phage lysozyme[J]. Biochimica et Biophysica Acta (BBA) — Protein Structure, 1975, 405(2): 442-451. 10.1016/0005-2795(75)90109-9 |
96 | GARCÍA‑PEDRAJAS N, ORTIZ‑BOYER D. Improving multiclass pattern recognition by the combination of two strategies[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006, 28(6): 1001-1006. 10.1109/tpami.2006.123 |
97 | FERNÁNDEZ A, LÓPEZ V, GALAR M, et al. Analysing the classification of imbalanced data‑sets with multiple classes: Binarization techniques and ad‑hoc approaches[J]. Knowledge‑ Based Systems, 2013, 42: 97-110. 10.1016/j.knosys.2013.01.018 |
98 | RAMENTOL E, VLUYMANS S, VERBIEST N, et al. IFROWANN: imbalanced fuzzy‑rough ordered weighted average nearest neighbor classification[J]. IEEE Transactions on Fuzzy Systems, 2015, 23(5): 1622-1637. 10.1109/tfuzz.2014.2371472 |
99 | BI J J, ZHANG C S. An empirical comparison on state‑of‑the‑art multi‑class imbalance learning algorithms and a new diversified ensemble learning scheme[J]. Knowledge‑Based Systems, 2018, 158: 81-93. 10.1016/j.knosys.2018.05.037 |
100 | KANG S, CHO S, KANG P. Constructing a multi‑class classifier using one‑against‑one approach with different binary classifiers[J]. Neurocomputing, 2015, 149(Pt B): 677-682. 10.1016/j.neucom.2014.08.006 |
[1] | 姚梓豪, 栗远明, 马自强, 李扬, 魏良根. 基于机器学习的多目标缓存侧信道攻击检测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1862-1871. |
[2] | 陈学斌, 任志强, 张宏扬. 联邦学习中的安全威胁与防御措施综述[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1663-1672. |
[3] | 佘维, 李阳, 钟李红, 孔德锋, 田钊. 基于改进实数编码遗传算法的神经网络超参数优化[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 671-676. |
[4] | 郑毅, 廖存燚, 张天倩, 王骥, 刘守印. 面向城区的基于图去噪的小区级RSRP估计方法[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 855-862. |
[5] | 李博, 黄建强, 黄东强, 王晓英. 基于异构平台的稀疏矩阵向量乘自适应计算优化[J]. 《计算机应用》唯一官方网站, 2024, 44(12): 3867-3875. |
[6] | 陈学斌, 屈昌盛. 面向联邦学习的后门攻击与防御综述[J]. 《计算机应用》唯一官方网站, 2024, 44(11): 3459-3469. |
[7] | 孙仁科, 皇甫志宇, 陈虎, 李仲年, 许新征. 神经架构搜索综述[J]. 《计算机应用》唯一官方网站, 2024, 44(10): 2983-2994. |
[8] | 柴汶泽, 范菁, 孙书魁, 梁一鸣, 刘竟锋. 深度度量学习综述[J]. 《计算机应用》唯一官方网站, 2024, 44(10): 2995-3010. |
[9] | 尹春勇, 周永成. 双端聚类的自动调整聚类联邦学习[J]. 《计算机应用》唯一官方网站, 2024, 44(10): 3011-3020. |
[10] | 崔昊阳, 张晖, 周雷, 杨春明, 李波, 赵旭剑. 有序规范实数对多相似度K最近邻分类算法[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2673-2678. |
[11] | 钟静, 林晨, 盛志伟, 张仕斌. 基于汉明距离的量子K-Means算法[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2493-2498. |
[12] | 蓝梦婕, 蔡剑平, 孙岚. 非独立同分布数据下的自正则化联邦学习优化方法[J]. 《计算机应用》唯一官方网站, 2023, 43(7): 2073-2081. |
[13] | 黄晓辉, 杨凯铭, 凌嘉壕. 基于共享注意力的多智能体强化学习订单派送[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1620-1624. |
[14] | 郝劭辰, 卫孜钻, 马垚, 于丹, 陈永乐. 基于高效联邦学习算法的网络入侵检测模型[J]. 《计算机应用》唯一官方网站, 2023, 43(4): 1169-1175. |
[15] | 孙晓飞, 朱静远, 陈斌, 游恒志. 融合多模态数据的药物合成反应的虚拟筛选[J]. 《计算机应用》唯一官方网站, 2023, 43(2): 622-629. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||