Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (4): 1086-1093.DOI: 10.11772/j.issn.1001-9081.2022040490
Special Issue: 数据科学与技术
• Data science and technology • Previous Articles Next Articles
Yi JIANG1, Shuping WU1(), Kun HU2, Linbo LONG1
Received:
2022-04-14
Revised:
2022-06-08
Accepted:
2022-06-13
Online:
2022-07-01
Published:
2023-04-10
Contact:
Shuping WU
About author:
JIANG Yi, born in 1969, Ph. D., senior engineer. His research interests include computer architecture, software engineering, big data, network security.Supported by:
通讯作者:
伍书平
作者简介:
蒋溢(1969—),男,湖北安陆人,正高级工程师,博士,CCF会员,主要研究方向:计算机体系结构、软件工程、大数据、网络安全;基金资助:
CLC Number:
Yi JIANG, Shuping WU, Kun HU, Linbo LONG. Imbalanced data classification method based on Lasso and constructive covering algorithm[J]. Journal of Computer Applications, 2023, 43(4): 1086-1093.
蒋溢, 伍书平, 胡昆, 龙林波. 基于Lasso和构造性覆盖算法的不均衡数据分类方法[J]. 《计算机应用》唯一官方网站, 2023, 43(4): 1086-1093.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2022040490
数据集 | 名称缩写 | 样本数 | 属性数 | 不平衡率 |
---|---|---|---|---|
Pima | ||||
D1 | 10 000 | 13 | 20.30 | |
D2 | 10 900 | 60 | 49.05 |
Tab. 1 Experiment datasets
数据集 | 名称缩写 | 样本数 | 属性数 | 不平衡率 |
---|---|---|---|---|
Pima | ||||
D1 | 10 000 | 13 | 20.30 | |
D2 | 10 900 | 60 | 49.05 |
真实标签 | 预测结果 | |
---|---|---|
正类 | 负类 | |
正类 | TP(True Positive) | FN(False Negative) |
负类 | FP(False Positive) | TN(True Negative) |
Tab. 2 Confusion matrix
真实标签 | 预测结果 | |
---|---|---|
正类 | 负类 | |
正类 | TP(True Positive) | FN(False Negative) |
负类 | FP(False Positive) | TN(True Negative) |
数据集 | L-CCSmote | S-Enn | S-Tomek | B1-S | Adasyn | OSS | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
LR | SVM | LR | SVM | LR | SVM | LR | SVM | LR | SVM | LR | SVM | |
AR | 1.53 | 2.13 | 3.40 | 3.13 | 2.60 | 3.27 | 4.13 | 2.87 | 2.73 | 3.27 | ||
G1 | 0.627 5 | 0.636 4 | 0.571 4 | 0.590 9 | 0.583 3 | 0.622 2 | 0.507 9 | 0.539 7 | 0.577 8 | 0.666 7 | ||
P1 | 0.671 4 | 0.671 1 | 0.629 6 | 0.650 6 | 0.643 4 | 0.662 1 | 0.657 3 | 0.653 8 | 0.615 4 | 0.633 1 | ||
V3 | 0.719 4 | 0.666 7 | 0.653 6 | 0.640 5 | 0.696 3 | 0.657 3 | 0.680 9 | 0.653 1 | 0.660 4 | 0.516 1 | ||
S0 | 0.981 6 | 0.981 6 | 0.981 6 | 0.975 3 | 0.981 6 | 0.981 6 | 0.975 3 | 0.975 3 | 0.981 6 | |||
G6 | 0.875 0 | 0.923 1 | 0 | 0.750 0 | 0.923 1 | 0.750 0 | 0.750 0 | 0.705 9 | 0.750 0 | 0.923 1 | ||
Y3 | 0.800 0 | 0.702 7 | 0.695 7 | 0.712 9 | 0.697 2 | 0.699 0 | 0.683 8 | 0.704 8 | 0.672 3 | 0.826 7 | ||
E3 | 0.782 6 | 0.666 7 | 0.782 6 | 0.636 4 | 0.782 6 | 0.608 7 | 0.600 0 | 0.695 7 | 0.571 4 | 0.400 0 | ||
V0 | 0.893 6 | 0.808 5 | 0.808 5 | 0.791 7 | 0.977 8 | 0.782 6 | 0.926 8 | |||||
C0 | 0.750 0 | 0.857 1 | 0.600 0 | 0.600 | 0.750 0 | 0.545 5 | 0.857 1 | 0.500 0 | ||||
E0 | 0.800 0 | 0.727 3 | 0.615 4 | 0.666 7 | 0.666 7 | 0.888 9 | 0.571 4 | 0.666 7 | 0.800 0 | 0.888 9 | ||
A9 | 0.769 2 | 0.645 2 | 0.555 6 | 0.320 0 | 0.769 2 | 0.645 2 | 0.645 2 | 0.500 0 | 0.705 9 | 0.285 7 | ||
Y5 | 0.523 8 | 0.600 0 | 0.550 0 | 0.600 0 | 0.536 6 | 0.600 0 | 0.536 6 | 0.428 6 | 0.666 7 | |||
A17 | 0.381 0 | 0.307 7 | 0.289 5 | 0.333 3 | 0.289 9 | 0.328 4 | 0.320 0 | 0.274 0 | 0.333 3 | 0.125 0 | ||
D1 | 0.422 9 | 0.452 2 | 0.434 0 | 0.507 4 | 0.345 7 | 0.428 6 | 0.290 0 | 0.373 4 | 0.749 2 | 0.823 5 | ||
D2 | 0.894 7 | 0.938 1 | 0.953 3 | 0.972 5 | 0.585 4 | 0.953 3 | 0.608 2 | 0.972 5 | 0.923 1 | 0.772 7 |
Tab. 3 Comparison of F1-score of different algorithms on two classfication algorithms
数据集 | L-CCSmote | S-Enn | S-Tomek | B1-S | Adasyn | OSS | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
LR | SVM | LR | SVM | LR | SVM | LR | SVM | LR | SVM | LR | SVM | |
AR | 1.53 | 2.13 | 3.40 | 3.13 | 2.60 | 3.27 | 4.13 | 2.87 | 2.73 | 3.27 | ||
G1 | 0.627 5 | 0.636 4 | 0.571 4 | 0.590 9 | 0.583 3 | 0.622 2 | 0.507 9 | 0.539 7 | 0.577 8 | 0.666 7 | ||
P1 | 0.671 4 | 0.671 1 | 0.629 6 | 0.650 6 | 0.643 4 | 0.662 1 | 0.657 3 | 0.653 8 | 0.615 4 | 0.633 1 | ||
V3 | 0.719 4 | 0.666 7 | 0.653 6 | 0.640 5 | 0.696 3 | 0.657 3 | 0.680 9 | 0.653 1 | 0.660 4 | 0.516 1 | ||
S0 | 0.981 6 | 0.981 6 | 0.981 6 | 0.975 3 | 0.981 6 | 0.981 6 | 0.975 3 | 0.975 3 | 0.981 6 | |||
G6 | 0.875 0 | 0.923 1 | 0 | 0.750 0 | 0.923 1 | 0.750 0 | 0.750 0 | 0.705 9 | 0.750 0 | 0.923 1 | ||
Y3 | 0.800 0 | 0.702 7 | 0.695 7 | 0.712 9 | 0.697 2 | 0.699 0 | 0.683 8 | 0.704 8 | 0.672 3 | 0.826 7 | ||
E3 | 0.782 6 | 0.666 7 | 0.782 6 | 0.636 4 | 0.782 6 | 0.608 7 | 0.600 0 | 0.695 7 | 0.571 4 | 0.400 0 | ||
V0 | 0.893 6 | 0.808 5 | 0.808 5 | 0.791 7 | 0.977 8 | 0.782 6 | 0.926 8 | |||||
C0 | 0.750 0 | 0.857 1 | 0.600 0 | 0.600 | 0.750 0 | 0.545 5 | 0.857 1 | 0.500 0 | ||||
E0 | 0.800 0 | 0.727 3 | 0.615 4 | 0.666 7 | 0.666 7 | 0.888 9 | 0.571 4 | 0.666 7 | 0.800 0 | 0.888 9 | ||
A9 | 0.769 2 | 0.645 2 | 0.555 6 | 0.320 0 | 0.769 2 | 0.645 2 | 0.645 2 | 0.500 0 | 0.705 9 | 0.285 7 | ||
Y5 | 0.523 8 | 0.600 0 | 0.550 0 | 0.600 0 | 0.536 6 | 0.600 0 | 0.536 6 | 0.428 6 | 0.666 7 | |||
A17 | 0.381 0 | 0.307 7 | 0.289 5 | 0.333 3 | 0.289 9 | 0.328 4 | 0.320 0 | 0.274 0 | 0.333 3 | 0.125 0 | ||
D1 | 0.422 9 | 0.452 2 | 0.434 0 | 0.507 4 | 0.345 7 | 0.428 6 | 0.290 0 | 0.373 4 | 0.749 2 | 0.823 5 | ||
D2 | 0.894 7 | 0.938 1 | 0.953 3 | 0.972 5 | 0.585 4 | 0.953 3 | 0.608 2 | 0.972 5 | 0.923 1 | 0.772 7 |
数据集 | L-CCSmote | S-Enn | S-Tomek | B1-S | Adasyn | OSS | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
LR | SVM | LR | SVM | LR | SVM | LR | SVM | LR | SVM | LR | SVM | |
AR | 1.47 | 1.67 | 2.87 | 2.93 | 2.60 | 3.33 | 3.40 | 3.27 | 4.13 | 3.40 | ||
G1 | 0.639 8 | 0.670 7 | 0.654 1 | 0.695 9 | 0.521 1 | 0.704 4 | 0.561 7 | 0.655 8 | 0.720 3 | 0.735 8 | ||
P1 | 0.746 7 | 0.744 6 | 0.704 6 | 0.723 0 | 0.723 3 | 0.738 2 | 0.734 2 | 0.728 6 | 0.706 5 | 0.716 4 | ||
V3 | 0.858 5 | 0.817 6 | 0.814 5 | 0.801 9 | 0.833 3 | 0.808 2 | 0.827 0 | 0.808 2 | 0.773 6 | 0.676 1 | ||
S0 | 0.986 8 | 0.980 7 | 0.986 8 | 0.990 9 | 0.980 7 | 0.990 9 | 0.980 7 | 0.986 8 | ||||
G6 | 0.978 7 | 0.928 6 | 0.896 7 | 0.928 6 | 0.896 7 | 0.896 7 | 0.886 0 | 0.896 7 | 0.928 6 | |||
Y3 | 0.939 2 | 0.902 7 | 0.918 0 | 0.899 6 | 0.933 3 | 0.910 3 | 0.930 2 | 0.873 5 | 0.855 2 | 0.930 1 | ||
E3 | 0.966 7 | 0.897 8 | 0.946 7 | 0.966 7 | 0.848 9 | 0.966 7 | 0.844 2 | 0.920 0 | 0.715 6 | 0.646 7 | ||
V0 | 0.968 4 | 0.918 5 | 0.918 5 | 0.916 3 | 0.997 8 | 0.895 8 | 0.931 8 | |||||
C0 | 0.975 6 | 0.987 8 | 0.975 6 | 0.939 0 | 0.987 8 | 0.821 1 | 0.666 7 | |||||
E0 | 0.892 3 | 0.884 6 | 0.869 2 | 0.876 9 | 0.876 9 | 0.900 0 | 0.861 5 | 0.876 9 | 0.892 3 | 0.900 0 | ||
A9 | 0.940 0 | 0.925 5 | 0.910 9 | 0.773 5 | 0.940 0 | 0.894 6 | 0.925 5 | 0.899 3 | 0.772 7 | 0.588 0 | ||
Y5 | 0.972 2 | 0.979 2 | 0.895 2 | 0.895 2 | 0.973 6 | 0.895 2 | 0.973 6 | 0.636 4 | 0.771 3 | |||
A17 | 0.868 4 | 0.855 3 | 0.822 8 | 0.794 7 | 0.834 2 | 0.830 7 | 0.857 9 | 0.791 2 | 0.600 0 | 0.533 3 | ||
D1 | 0.897 6 | 0.903 8 | 0.894 3 | 0.893 7 | 0.885 0 | 0.892 5 | 0.861 6 | 0.873 9 | 0.846 6 | 0.886 2 | ||
D2 | 0.970 5 | 0.971 8 | 0.990 4 | 0.980 9 | 0.932 8 | 0.971 8 | 0.969 3 | 0.990 4 | 0.944 1 | 0.814 8 |
Tab. 4 Comparison of AUC value of different algorithms on two classfication algorithms
数据集 | L-CCSmote | S-Enn | S-Tomek | B1-S | Adasyn | OSS | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
LR | SVM | LR | SVM | LR | SVM | LR | SVM | LR | SVM | LR | SVM | |
AR | 1.47 | 1.67 | 2.87 | 2.93 | 2.60 | 3.33 | 3.40 | 3.27 | 4.13 | 3.40 | ||
G1 | 0.639 8 | 0.670 7 | 0.654 1 | 0.695 9 | 0.521 1 | 0.704 4 | 0.561 7 | 0.655 8 | 0.720 3 | 0.735 8 | ||
P1 | 0.746 7 | 0.744 6 | 0.704 6 | 0.723 0 | 0.723 3 | 0.738 2 | 0.734 2 | 0.728 6 | 0.706 5 | 0.716 4 | ||
V3 | 0.858 5 | 0.817 6 | 0.814 5 | 0.801 9 | 0.833 3 | 0.808 2 | 0.827 0 | 0.808 2 | 0.773 6 | 0.676 1 | ||
S0 | 0.986 8 | 0.980 7 | 0.986 8 | 0.990 9 | 0.980 7 | 0.990 9 | 0.980 7 | 0.986 8 | ||||
G6 | 0.978 7 | 0.928 6 | 0.896 7 | 0.928 6 | 0.896 7 | 0.896 7 | 0.886 0 | 0.896 7 | 0.928 6 | |||
Y3 | 0.939 2 | 0.902 7 | 0.918 0 | 0.899 6 | 0.933 3 | 0.910 3 | 0.930 2 | 0.873 5 | 0.855 2 | 0.930 1 | ||
E3 | 0.966 7 | 0.897 8 | 0.946 7 | 0.966 7 | 0.848 9 | 0.966 7 | 0.844 2 | 0.920 0 | 0.715 6 | 0.646 7 | ||
V0 | 0.968 4 | 0.918 5 | 0.918 5 | 0.916 3 | 0.997 8 | 0.895 8 | 0.931 8 | |||||
C0 | 0.975 6 | 0.987 8 | 0.975 6 | 0.939 0 | 0.987 8 | 0.821 1 | 0.666 7 | |||||
E0 | 0.892 3 | 0.884 6 | 0.869 2 | 0.876 9 | 0.876 9 | 0.900 0 | 0.861 5 | 0.876 9 | 0.892 3 | 0.900 0 | ||
A9 | 0.940 0 | 0.925 5 | 0.910 9 | 0.773 5 | 0.940 0 | 0.894 6 | 0.925 5 | 0.899 3 | 0.772 7 | 0.588 0 | ||
Y5 | 0.972 2 | 0.979 2 | 0.895 2 | 0.895 2 | 0.973 6 | 0.895 2 | 0.973 6 | 0.636 4 | 0.771 3 | |||
A17 | 0.868 4 | 0.855 3 | 0.822 8 | 0.794 7 | 0.834 2 | 0.830 7 | 0.857 9 | 0.791 2 | 0.600 0 | 0.533 3 | ||
D1 | 0.897 6 | 0.903 8 | 0.894 3 | 0.893 7 | 0.885 0 | 0.892 5 | 0.861 6 | 0.873 9 | 0.846 6 | 0.886 2 | ||
D2 | 0.970 5 | 0.971 8 | 0.990 4 | 0.980 9 | 0.932 8 | 0.971 8 | 0.969 3 | 0.990 4 | 0.944 1 | 0.814 8 |
数据集 | L-CCSmote | S-Enn | S-Tomek | B1-S | Adasyn | OSS | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
LR | SVM | LR | SVM | LR | SVM | LR | SVM | LR | SVM | LR | SVM | |
AR | 1.47 | 1.67 | 2.80 | 3.00 | 2.60 | 3.33 | 3.47 | 3.27 | 4.20 | 3.47 | ||
G1 | 0.632 5 | 0.670 5 | 0.648 9 | 0.695 8 | 0.410 4 | 0.704 4 | 0.452 2 | 0.655 8 | 0.693 7 | 0.735 8 | ||
P1 | 0.745 4 | 0.744 4 | 0.702 3 | 0.718 2 | 0.722 3 | 0.737 9 | 0.727 9 | 0.698 0 | 0.713 9 | |||
V3 | 0.854 3 | 0.812 9 | 0.804 2 | 0.792 5 | 0.831 6 | 0.804 3 | 0.823 3 | 0.802 3 | 0.765 3 | 0.638 2 | ||
S0 | 0.986 7 | 0.980 5 | 0.986 7 | 0.990 9 | 0.980 5 | 0.990 9 | 0.980 5 | 0.986 7 | ||||
G6 | 0.978 5 | 0.925 8 | 0.895 8 | 0.925 8 | 0.895 8 | 0.895 8 | 0.885 5 | 0.895 8 | 0.925 8 | |||
Y3 | 0.939 1 | 0.902 3 | 0.917 9 | 0.899 4 | 0.932 3 | 0.910 3 | 0.929 1 | 0.865 6 | 0.846 3 | 0.930 1 | ||
E3 | 0.952 2 | 0.897 7 | 0.938 0 | 0.945 2 | 0.845 9 | 0.839 8 | 0.923 8 | 0.805 5 | 0.565 7 | |||
V0 | 0.968 3 | 0.916 8 | 0.916 8 | 0.914 8 | 0.997 8 | 0.892 4 | 0.929 3 | |||||
C0 | 0.975 3 | 0.987 7 | 0.975 3 | 0.937 0 | 0.987 7 | 0.806 5 | 0.577 4 | |||||
E0 | 0.887 5 | 0.880 6 | 0.866 5 | 0.873 5 | 0.873 5 | 0.894 4 | 0.859 3 | 0.873 5 | 0.887 5 | 0.894 4 | ||
A9 | 0.939 5 | 0.925 3 | 0.910 9 | 0.772 1 | 0.939 5 | 0.891 3 | 0.925 3 | 0.899 3 | 0.738 5 | 0.425 2 | ||
Y5 | 0.971 8 | 0.978 9 | 0.891 9 | 0.891 9 | 0.973 3 | 0.891 9 | 0.973 3 | 0.522 3 | 0.737 5 | |||
A17 | 0.865 7 | 0.853 5 | 0.817 9 | 0.784 4 | 0.828 1 | 0.825 0 | 0.855 9 | 0.781 4 | 0.447 2 | 0.258 2 | ||
D1 | 0.897 5 | 0.903 2 | 0.894 3 | 0.893 7 | 0.883 1 | 0.892 5 | 0.857 1 | 0.873 9 | 0.834 1 | 0.879 5 | ||
D2 | 0.970 2 | 0.971 5 | 0.990 3 | 0.980 8 | 0.931 8 | 0.971 5 | 0.969 3 | 0.990 3 | 0.942 5 | 0.793 5 |
Tab. 5 Comparison of G-MEAN value of different algorithms on two classfication algorithms
数据集 | L-CCSmote | S-Enn | S-Tomek | B1-S | Adasyn | OSS | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
LR | SVM | LR | SVM | LR | SVM | LR | SVM | LR | SVM | LR | SVM | |
AR | 1.47 | 1.67 | 2.80 | 3.00 | 2.60 | 3.33 | 3.47 | 3.27 | 4.20 | 3.47 | ||
G1 | 0.632 5 | 0.670 5 | 0.648 9 | 0.695 8 | 0.410 4 | 0.704 4 | 0.452 2 | 0.655 8 | 0.693 7 | 0.735 8 | ||
P1 | 0.745 4 | 0.744 4 | 0.702 3 | 0.718 2 | 0.722 3 | 0.737 9 | 0.727 9 | 0.698 0 | 0.713 9 | |||
V3 | 0.854 3 | 0.812 9 | 0.804 2 | 0.792 5 | 0.831 6 | 0.804 3 | 0.823 3 | 0.802 3 | 0.765 3 | 0.638 2 | ||
S0 | 0.986 7 | 0.980 5 | 0.986 7 | 0.990 9 | 0.980 5 | 0.990 9 | 0.980 5 | 0.986 7 | ||||
G6 | 0.978 5 | 0.925 8 | 0.895 8 | 0.925 8 | 0.895 8 | 0.895 8 | 0.885 5 | 0.895 8 | 0.925 8 | |||
Y3 | 0.939 1 | 0.902 3 | 0.917 9 | 0.899 4 | 0.932 3 | 0.910 3 | 0.929 1 | 0.865 6 | 0.846 3 | 0.930 1 | ||
E3 | 0.952 2 | 0.897 7 | 0.938 0 | 0.945 2 | 0.845 9 | 0.839 8 | 0.923 8 | 0.805 5 | 0.565 7 | |||
V0 | 0.968 3 | 0.916 8 | 0.916 8 | 0.914 8 | 0.997 8 | 0.892 4 | 0.929 3 | |||||
C0 | 0.975 3 | 0.987 7 | 0.975 3 | 0.937 0 | 0.987 7 | 0.806 5 | 0.577 4 | |||||
E0 | 0.887 5 | 0.880 6 | 0.866 5 | 0.873 5 | 0.873 5 | 0.894 4 | 0.859 3 | 0.873 5 | 0.887 5 | 0.894 4 | ||
A9 | 0.939 5 | 0.925 3 | 0.910 9 | 0.772 1 | 0.939 5 | 0.891 3 | 0.925 3 | 0.899 3 | 0.738 5 | 0.425 2 | ||
Y5 | 0.971 8 | 0.978 9 | 0.891 9 | 0.891 9 | 0.973 3 | 0.891 9 | 0.973 3 | 0.522 3 | 0.737 5 | |||
A17 | 0.865 7 | 0.853 5 | 0.817 9 | 0.784 4 | 0.828 1 | 0.825 0 | 0.855 9 | 0.781 4 | 0.447 2 | 0.258 2 | ||
D1 | 0.897 5 | 0.903 2 | 0.894 3 | 0.893 7 | 0.883 1 | 0.892 5 | 0.857 1 | 0.873 9 | 0.834 1 | 0.879 5 | ||
D2 | 0.970 2 | 0.971 5 | 0.990 3 | 0.980 8 | 0.931 8 | 0.971 5 | 0.969 3 | 0.990 3 | 0.942 5 | 0.793 5 |
算法 | F1 | AUC | G-MEAN | |||
---|---|---|---|---|---|---|
LR | SVM | LR | SVM | LR | SVM | |
S-Enn | 0.003 5 | 0.041 3 | 0.002 3 | 0.018 6 | 0.002 3 | 0.018 6 |
S-Tomek | 0.004 7 | 0.209 4 | 0.003 7 | 0.007 6 | 0.002 4 | 0.007 6 |
B1-S | 0.003 7 | 0.729 9 | 0.003 7 | 0.025 8 | 0.002 3 | 0.021 9 |
Adasyn | 0.000 9 | 0.074 7 | 0.001 2 | 0.004 6 | 0.001 2 | 0.003 7 |
OSS | 0.080 0 | 0.182 3 | 0.001 5 | 0.012 1 | 0.001 5 | 0.012 1 |
Tab. 6 Wilcoxon signed rank test results
算法 | F1 | AUC | G-MEAN | |||
---|---|---|---|---|---|---|
LR | SVM | LR | SVM | LR | SVM | |
S-Enn | 0.003 5 | 0.041 3 | 0.002 3 | 0.018 6 | 0.002 3 | 0.018 6 |
S-Tomek | 0.004 7 | 0.209 4 | 0.003 7 | 0.007 6 | 0.002 4 | 0.007 6 |
B1-S | 0.003 7 | 0.729 9 | 0.003 7 | 0.025 8 | 0.002 3 | 0.021 9 |
Adasyn | 0.000 9 | 0.074 7 | 0.001 2 | 0.004 6 | 0.001 2 | 0.003 7 |
OSS | 0.080 0 | 0.182 3 | 0.001 5 | 0.012 1 | 0.001 5 | 0.012 1 |
算法 | F1 | AUC | G-MEAN | |||
---|---|---|---|---|---|---|
LR | SVM | LR | SVM | LR | SVM | |
S-Enn | 65.56 | 68.62 | 88.87 | 86.74 | 88.66 | 86.43 |
S-Tomek | 68.16 | 87.32 | 87.03 | |||
B1-S | 65.63 | 69.93 | 87.78 | 86.73 | ||
Adasyn | 61.03 | 68.35 | 87.71 | 86.18 | 86.86 | 85.22 |
OSS | 65.18 | 81.45 | 78.90 | 79.99 | 74.30 | |
L-CCSmote | 73.01 | 71.91 | 91.03 | 89.47 | 90.72 | 89.32 |
Tab. 7 Average evaluation indicators of different algorithms
算法 | F1 | AUC | G-MEAN | |||
---|---|---|---|---|---|---|
LR | SVM | LR | SVM | LR | SVM | |
S-Enn | 65.56 | 68.62 | 88.87 | 86.74 | 88.66 | 86.43 |
S-Tomek | 68.16 | 87.32 | 87.03 | |||
B1-S | 65.63 | 69.93 | 87.78 | 86.73 | ||
Adasyn | 61.03 | 68.35 | 87.71 | 86.18 | 86.86 | 85.22 |
OSS | 65.18 | 81.45 | 78.90 | 79.99 | 74.30 | |
L-CCSmote | 73.01 | 71.91 | 91.03 | 89.47 | 90.72 | 89.32 |
1 | AlSHOURBAJI I, HELIAN N, SUN Y, et al. Anovel HEOMGA approach for class imbalance problem in the application of customer churn prediction[J]. SN Computer Science, 2021, 2(6): No.464. 10.1007/s42979-021-00850-y |
2 | TRAN T C, DANG T K. Machine learning for prediction of imbalanced data: credit fraud detection[C]// Proceedings of the 15th International Conference on Ubiquitous Information Management and Communication. Piscataway: IEEE, 2021:1-7. 10.1109/imcom51814.2021.9377352 |
3 | LIU H, LIU Z Y, JIA W Q, et al. A novel imbalanced data classification method based on weakly supervised learning for fault diagnosis[J]. IEEE Transactions on Industrial Informatics, 2022, 18(3):1583-1593. 10.1109/tii.2021.3084132 |
4 | 江昊琛,魏子麒,刘璘,等. 非均衡数据分类经典方法综述与面向医疗领域的实验分析[J]. 计算机科学, 2022, 49(1): 80-88. 10.11896/jsjkx.210200124 |
JIANG H C, WEI Z Q, LIU L, et al. Imbalanced data classification: a survey and experiments in medical domain[J]. Computer Science, 2022, 49(1):80-88. 10.11896/jsjkx.210200124 | |
5 | BUREZ J, D van den POEL. Handling class imbalance in customer churn prediction[J]. Expert Systems with Applications, 2009, 36(3 Pt 1): 4626-4636. 10.1016/j.eswa.2008.05.027 |
6 | KHAN S H, HAYAT M, BENNAMOUN M, et al. Cost-sensitive learning of deep feature representations from imbalanced data[J]. IEEE Transactions on Neural Networks and Learning Systems, 2018, 29(8): 3573-3587. 10.1109/tnnls.2017.2732482 |
7 | CHAWLA N V, BOWYER K W, HALL L O, et al. SMOTE: synthetic minority over-sampling technique[J]. Journal of Artificial Intelligence Research, 2002, 16: 321-357. 10.1613/jair.953 |
8 | HAN H, WANG W Y, MAO B H. Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning[C]// Proceedings of the 2005 International Conference on Intelligent Computing, LNCS 3644. Berlin: Springer, 2005: 878-887. |
9 | HE H B, BAI Y, GARCIA E A, et al. ADASYN: adaptive synthetic sampling approach for imbalanced learning[C]// Proceedings of the 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence). Piscataway: IEEE, 2008: 1322-1328. 10.1109/ijcnn.2008.4633969 |
10 | 严远亭,朱原玮,吴增宝,等. 构造性覆盖算法的SMOTE过采样方法[J]. 计算机科学与探索, 2020, 14(6): 975-984. 10.3778/j.issn.1673-9418.1905091 |
YAN Y T, ZHU Y W, WU Z B, et al. Constructive covering algorithm-based SMOTE over-sampling method[J]. Journal of Frontiers of Computer Science and Technology, 2020, 14(6): 975-984. 10.3778/j.issn.1673-9418.1905091 | |
11 | TAO X M, ZHENG Y J, CHEN W, et al. SVDD-based weighted oversampling technique for imbalanced and overlapped dataset learning[J]. Information Sciences, 2022, 588: 13-51. 10.1016/j.ins.2021.12.066 |
12 | WILSON D L. Asymptotic properties of nearest neighbor rules using edited data[J]. IEEE Transactions on Systems, Man, and Cybernetics, 1972, SMC-2(3): 408-421. 10.1109/tsmc.1972.4309137 |
13 | LI L S, HE H B, LI J. Entropy-based sampling approaches for multi-class imbalanced problems[J]. IEEE Transactions on Knowledge and Data Engineering, 2020, 32(11): 2159-2170. 10.1109/tkde.2019.2913859 |
14 | BATISTA G E A P A, PRATI R C, MONARD M C. A study of the behavior of several methods for balancing machine learning training data[J]. ACM SIGKDD Explorations Newsletter, 2004, 6(1): 20-29. 10.1145/1007730.1007735 |
15 | AWALLUDIN, ADIWIJAYA, BIJAKSANA M A, et al. Churn prediction on fixed broadband internet using combined feed-forward neural network and SMOTEBoost algorithm[C]// Proceedings of the 5th International Conference on Information and Communication Technology. Piscataway: IEEE, 2017: 1-6. 10.1109/icoict.2017.8074672 |
16 | WANG J Z, WANG R, LI Z W. A combined forecasting system based on multi-objective optimization and feature extraction strategy for hourly PM2.5 concentration[J]. Applied Soft Computing, 2022, 114: No.108034. 10.1016/j.asoc.2021.108034 |
17 | ABBASI J S, BASHIR F, QURESHI K N, et al. Deep learning-based feature extraction and optimizing pattern matching for intrusion detection using finite state machine[J]. Computers and Electrical Engineering, 2021, 92: No.107094. 10.1016/j.compeleceng.2021.107094 |
18 | EFFENDY V, ADIWIJAYA, BAIZAL Z K A. Handling imbalanced data in customer churn prediction using combined sampling and weighted random forest[C]// Proceedings of the 2nd International Conference on Information and Communication Technology. Piscataway: IEEE, 2014: 325-330. 10.1109/icoict.2014.6914086 |
19 | YE H, QU X L, LIU S Z, et al. Hybrid sampling method for autoregressive classification trees under density-weighted curvature distance[J]. Enterprise Information Systems, 2021, 15(5): 749-768. 10.1080/17517575.2020.1762245 |
20 | DING Z H, RAO R T, YAN Y T, et al. Voting based constructive covering algorithm[C]// Proceedings of the IEEE 14th International Conference on Intelligent Systems and Knowledge Engineering. Piscataway: IEEE, 2019: 720-724. 10.1109/iske47853.2019.9170310 |
21 | YE D J, LIANG D C, LI T, et al. Multi-class decision-making method for decision-theoretic rough sets based on the constructive covering algorithm[J]. IEEE Access, 2020, 8: 57833-57848. 10.1109/access.2020.2982437 |
22 | TIBSHIRANI R. Regression shrinkage and selection via the Las- so: a retrospective[J]. Journal of the Royal Statistical Society B: Series B Statistical Methodology, 2011, 73(3): 273-282. 10.1111/j.1467-9868.2011.00771.x |
23 | MISWAN N H, CHAN C S, NG C G. Hospital readmission prediction based on improved feature selection using grey relational analysis and LASSO[J]. Grey Systems: Theory and Application, 2021, 11(4): 796-812. 10.1108/gs-12-2020-0168 |
24 | OYEDOTUN O K, SHABAYEK A E R, AOUADA D, et al. Deep network compression with teacher latent subspace learning and Lasso[J]. Applied Intelligence, 2021, 51(2): 834-853. 10.1007/s10489-020-01858-2 |
25 | KONERU B N G, CHANDRACHOODAN N, VASUDEVAN V. A smoothed LASSO-based DNN sparsification technique[J]. IEEE Transactions on Circuits and Systems I: Regular Papers, 2021, 68(10): 4287-4298. 10.1109/tcsi.2021.3097765 |
26 | 张旻,张铃. 构造性覆盖算法的知识发现方法研究[J]. 电子与信息学报, 2006, 28(7): 1322-1326. 10.1016/S1005-8885(07)60041-7 |
ZHANG M, ZHANG L. Study on the method of knowledge discover based on the structured covering algorithm[J]. Journal of Electronics and Information Technology, 2006, 28(7): 1322-1326. 10.1016/S1005-8885(07)60041-7 | |
27 | KUBAT M, HOLTE R C, MATWIN S. Machine learning for the detection of oil spills in satellite radar images[J]. Machine Learning, 1998, 30(2/3): 195-215. 10.1023/a:1007452223027 |
28 | KUBAT M, MATWIN S. Addressing the curse of imbalanced training sets: one-sided selection[C]// Proceedings of the Fourteenth International Conference on Machine Learning. San Francisco: Morgan Kaufmann, 1997: 179-186. 10.1023/a:1007452223027 |
[1] | Qiangkui LENG, Xuezi SUN, Xiangfu MENG. Oversampling method for imbalanced data based on sample potential and noise evolution [J]. Journal of Computer Applications, 2024, 44(8): 2466-2475. |
[2] | Xuewen LIU, Jikui WANG, Zhengguo YANG, Qiang LI, Jihai YI, Bing LI, Feiping NIE. Imbalanced data classification algorithm based on ball cluster partitioning and undersampling with density peak optimization [J]. Journal of Computer Applications, 2022, 42(5): 1455-1463. |
[3] | LI Yao, ZHAO Yunpeng, LI Xinyun, LIU Zhifen, CHEN Junjie, GUO Hao. Construction of brain functional hypernetwork and feature fusion analysis based on sparse group Lasso method [J]. Journal of Computer Applications, 2020, 40(1): 62-70. |
[4] | WANG Lin, GUO Nana. Imbalanced telecom customer data classification method based on dissimilarity [J]. Journal of Computer Applications, 2017, 37(4): 1032-1037. |
[5] | MAO Wentao, WANG Jinwan, HE Ling, YUAN Peiyan. Hybrid sampling extreme learning machine for sequential imbalanced data [J]. Journal of Computer Applications, 2015, 35(8): 2221-2226. |
[6] | CAO Peng LI Bo LI Wei ZHAO Dazhe. Imbalanced data learning based on particle swarm optimization [J]. Journal of Computer Applications, 2013, 33(03): 789-792. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||