Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (2): 475-484.DOI: 10.11772/j.issn.1001-9081.2021050957
• Data science and technology • Previous Articles Next Articles
Yiheng LI, Chenxi DU, Yanyan YANG(), Xiangyu LI
Received:
2021-03-25
Revised:
2021-07-21
Accepted:
2021-07-21
Online:
2022-02-11
Published:
2022-02-10
Contact:
Yanyan YANG
About author:
LI Yiheng, born in 2001. His research interests include machine learning.Supported by:
通讯作者:
杨燕燕
作者简介:
李懿恒(2001—),男,山西临汾人,主要研究方向:机器学习;基金资助:
CLC Number:
Yiheng LI, Chenxi DU, Yanyan YANG, Xiangyu LI. Feature selection algorithm for imbalanced data based on pseudo-label consistency[J]. Journal of Computer Applications, 2022, 42(2): 475-484.
李懿恒, 杜晨曦, 杨燕燕, 李翔宇. 基于伪标签一致度的不平衡数据特征选择算法[J]. 《计算机应用》唯一官方网站, 2022, 42(2): 475-484.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2021050957
序号 | 数据集 | I | F | IR | P/% | N/% | 数据类型 |
---|---|---|---|---|---|---|---|
D1 | arrhythmia | 452 | 279 | 9.27 | 9.73 | 90.27 | mixed |
D2 | crx | 690 | 15 | 1.25 | 44.49 | 55.51 | mixed |
D3 | glass | 214 | 9 | 2.06 | 32.71 | 67.29 | numerical |
D4 | heart | 270 | 13 | 1.25 | 44.44 | 55.56 | mixed |
D5 | segmentation | 2 308 | 19 | 6.02 | 14.25 | 85.75 | numerical |
D6 | tic-tac-toe | 958 | 9 | 1.89 | 34.66 | 65.34 | nominal |
D7 | wdbc | 569 | 30 | 1.68 | 37.26 | 62.74 | numerical |
D8 | wpbc | 198 | 33 | 3.21 | 23.74 | 76.26 | numerical |
D9 | yeast | 1 484 | 8 | 2.46 | 28.91 | 71.09 | numerical |
D10 | zoo | 101 | 16 | 19.20 | 4.95 | 95.05 | nominal |
Tab. 1 Experimental datasets
序号 | 数据集 | I | F | IR | P/% | N/% | 数据类型 |
---|---|---|---|---|---|---|---|
D1 | arrhythmia | 452 | 279 | 9.27 | 9.73 | 90.27 | mixed |
D2 | crx | 690 | 15 | 1.25 | 44.49 | 55.51 | mixed |
D3 | glass | 214 | 9 | 2.06 | 32.71 | 67.29 | numerical |
D4 | heart | 270 | 13 | 1.25 | 44.44 | 55.56 | mixed |
D5 | segmentation | 2 308 | 19 | 6.02 | 14.25 | 85.75 | numerical |
D6 | tic-tac-toe | 958 | 9 | 1.89 | 34.66 | 65.34 | nominal |
D7 | wdbc | 569 | 30 | 1.68 | 37.26 | 62.74 | numerical |
D8 | wpbc | 198 | 33 | 3.21 | 23.74 | 76.26 | numerical |
D9 | yeast | 1 484 | 8 | 2.46 | 28.91 | 71.09 | numerical |
D10 | zoo | 101 | 16 | 19.20 | 4.95 | 95.05 | nominal |
数据集 | CFS | PLCFS | mRMR | Relief |
---|---|---|---|---|
D1 | 19 | 18 | 24 | 24 |
D2 | 13 | 13 | 10 | 12 |
D3 | 8 | 7 | 5 | 7 |
D4 | 12 | 11 | 10 | 10 |
D5 | 13 | 16 | 13 | 14 |
D6 | 10 | 8 | 8 | 7 |
D7 | 14 | 24 | 24 | 12 |
D8 | 15 | 16 | 17 | 11 |
D9 | 9 | 7 | 8 | 7 |
D10 | 10 | 11 | 14 | 13 |
Tab. 2 Numbers of features selected by four algorithms on 10 datasets
数据集 | CFS | PLCFS | mRMR | Relief |
---|---|---|---|---|
D1 | 19 | 18 | 24 | 24 |
D2 | 13 | 13 | 10 | 12 |
D3 | 8 | 7 | 5 | 7 |
D4 | 12 | 11 | 10 | 10 |
D5 | 13 | 16 | 13 | 14 |
D6 | 10 | 8 | 8 | 7 |
D7 | 14 | 24 | 24 | 12 |
D8 | 15 | 16 | 17 | 11 |
D9 | 9 | 7 | 8 | 7 |
D10 | 10 | 11 | 14 | 13 |
数据集 | micro-F1 | macro-F1 | G-Mean | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
CFS | PLCFS | mRMR | Relief | CFS | PLCFS | mRMR | Relief | CFS | PLCFS | mRMR | Relief | |
D1 | 0.555 0 | 0.541 8 | 0.6592 | 0.550 6 | 0.100 1 | 0.073 1 | 0.2378 | 0.102 4 | 0.725 3 | 0.715 8 | 0.7968 | 0.722 0 |
D2 | 0.823 2 | 0.824 6 | 0.815 9 | 0.8464 | 0.686 1 | 0.687 6 | 0.678 6 | 0.7132 | 0.823 2 | 0.824 6 | 0.815 9 | 0.8464 |
D3 | 0.614 3 | 0.614 3 | 0.6422 | 0.614 3 | 0.489 1 | 0.490 9 | 0.5289 | 0.488 6 | 0.492 6 | 0.492 3 | 0.5114 | 0.492 6 |
D4 | 0.8481 | 0.840 7 | 0.837 0 | 0.833 3 | 0.8448 | 0.837 1 | 0.833 5 | 0.830 5 | 0.8481 | 0.840 7 | 0.837 0 | 0.833 3 |
D5 | 0.781 0 | 0.776 2 | 0.7831 | 0.773 2 | 0.6979 | 0.665 9 | 0.695 7 | 0.683 9 | 0.864 7 | 0.862 1 | 0.8663 | 0.859 5 |
D6 | — | 0.6864 | 0.686 4 | 0.660 3 | — | 0.4346 | 0.434 6 | 0.429 6 | — | 0.6864 | 0.686 4 | 0.660 3 |
D7 | 0.949 0 | 0.9490 | 0.9490 | 0.933 2 | 0.942 5 | 0.941 9 | 0.9430 | 0.925 2 | 0.949 0 | 0.949 0 | 0.9490 | 0.933 2 |
D8 | 0.759 0 | 0.7590 | 0.7590 | 0.759 0 | 0.688 6 | 0.688 6 | 0.6886 | 0.688 6 | 0.159 0 | 0.159 0 | 0.1590 | 0.159 0 |
D9 | — | 0.476 4 | 0.4778 | 0.477 8 | — | 0.227 9 | 0.2679 | 0.228 8 | — | 0.669 5 | 0.6695 | 0.669 5 |
D10 | 0.9000 | 0.840 5 | 0.890 0 | 0.900 0 | 0.8174 | 0.680 7 | 0.803 5 | 0.809 3 | 0.736 7 | 0.8865 | 0.731 7 | 0.738 5 |
Tab. 3 Index scores of 10 datasets under SVM classifier
数据集 | micro-F1 | macro-F1 | G-Mean | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
CFS | PLCFS | mRMR | Relief | CFS | PLCFS | mRMR | Relief | CFS | PLCFS | mRMR | Relief | |
D1 | 0.555 0 | 0.541 8 | 0.6592 | 0.550 6 | 0.100 1 | 0.073 1 | 0.2378 | 0.102 4 | 0.725 3 | 0.715 8 | 0.7968 | 0.722 0 |
D2 | 0.823 2 | 0.824 6 | 0.815 9 | 0.8464 | 0.686 1 | 0.687 6 | 0.678 6 | 0.7132 | 0.823 2 | 0.824 6 | 0.815 9 | 0.8464 |
D3 | 0.614 3 | 0.614 3 | 0.6422 | 0.614 3 | 0.489 1 | 0.490 9 | 0.5289 | 0.488 6 | 0.492 6 | 0.492 3 | 0.5114 | 0.492 6 |
D4 | 0.8481 | 0.840 7 | 0.837 0 | 0.833 3 | 0.8448 | 0.837 1 | 0.833 5 | 0.830 5 | 0.8481 | 0.840 7 | 0.837 0 | 0.833 3 |
D5 | 0.781 0 | 0.776 2 | 0.7831 | 0.773 2 | 0.6979 | 0.665 9 | 0.695 7 | 0.683 9 | 0.864 7 | 0.862 1 | 0.8663 | 0.859 5 |
D6 | — | 0.6864 | 0.686 4 | 0.660 3 | — | 0.4346 | 0.434 6 | 0.429 6 | — | 0.6864 | 0.686 4 | 0.660 3 |
D7 | 0.949 0 | 0.9490 | 0.9490 | 0.933 2 | 0.942 5 | 0.941 9 | 0.9430 | 0.925 2 | 0.949 0 | 0.949 0 | 0.9490 | 0.933 2 |
D8 | 0.759 0 | 0.7590 | 0.7590 | 0.759 0 | 0.688 6 | 0.688 6 | 0.6886 | 0.688 6 | 0.159 0 | 0.159 0 | 0.1590 | 0.159 0 |
D9 | — | 0.476 4 | 0.4778 | 0.477 8 | — | 0.227 9 | 0.2679 | 0.228 8 | — | 0.669 5 | 0.6695 | 0.669 5 |
D10 | 0.9000 | 0.840 5 | 0.890 0 | 0.900 0 | 0.8174 | 0.680 7 | 0.803 5 | 0.809 3 | 0.736 7 | 0.8865 | 0.731 7 | 0.738 5 |
数据集 | micro-F1 | macro-F1 | G-Mean | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
CFS | PLCFS | mRMR | Relief | CFS | PLCFS | mRMR | Relief | CFS | PLCFS | mRMR | Relief | |
D1 | 0.537 3 | 0.526 4 | 0.6659 | 0.546 2 | 0.120 5 | 0.130 6 | 0.2786 | 0.182 8 | 0.714 4 | 0.707 4 | 0.8020 | 0.719 9 |
D2 | 0.829 0 | 0.826 1 | 0.8536 | 0.826 1 | 0.707 2 | 0.704 4 | 0.7405 | 0.694 0 | 0.829 0 | 0.826 1 | 0.8536 | 0.826 1 |
D3 | 0.646 8 | 0.646 8 | 0.651 5 | 0.6561 | 0.475 7 | 0.494 7 | 0.468 0 | 0.4988 | 0.526 3 | 0.525 8 | 0.530 9 | 0.5322 |
D4 | 0.8333 | 0.811 1 | 0.814 8 | 0.800 0 | 0.8299 | 0.805 7 | 0.811 9 | 0.795 4 | 0.8333 | 0.811 1 | 0.814 8 | 0.800 0 |
D5 | 0.822 5 | 0.826 0 | 0.8290 | 0.819 0 | 0.701 7 | 0.7043 | 0.703 8 | 0.695 6 | 0.891 6 | 0.893 7 | 0.8964 | 0.889 5 |
D6 | — | 0.690 6 | 0.717 8 | 0.7513 | — | 0.445 3 | 0.459 4 | 0.4909 | — | 0.690 6 | 0.7731 | 0.717 8 |
D7 | 0.947 3 | 0.940 2 | 0.9526 | 0.947 3 | 0.940 6 | 0.937 1 | 0.9476 | 0.940 7 | 0.947 3 | 0.943 8 | 0.9526 | 0.947 3 |
D8 | 0.663 3 | 0.6887 | 0.678 6 | 0.658 6 | 0.361 1 | 0.3855 | 0.381 2 | 0.375 5 | 0.663 3 | 0.6887 | 0.678 6 | 0.658 6 |
D9 | — | 0.425 8 | 0.4556 | 0.423 2 | — | 0.213 3 | 0.2173 | 0.213 1 | — | 0.630 1 | 0.6528 | 0.628 1 |
D10 | 0.860 0 | 0.820 5 | 0.8900 | 0.880 0 | 0.686 0 | 0.666 9 | 0.802 0 | 0.8065 | 0.710 3 | 0.8754 | 0.730 1 | 0.722 4 |
Tab. 4 Index scores of 10 datasets under KNN classifier
数据集 | micro-F1 | macro-F1 | G-Mean | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
CFS | PLCFS | mRMR | Relief | CFS | PLCFS | mRMR | Relief | CFS | PLCFS | mRMR | Relief | |
D1 | 0.537 3 | 0.526 4 | 0.6659 | 0.546 2 | 0.120 5 | 0.130 6 | 0.2786 | 0.182 8 | 0.714 4 | 0.707 4 | 0.8020 | 0.719 9 |
D2 | 0.829 0 | 0.826 1 | 0.8536 | 0.826 1 | 0.707 2 | 0.704 4 | 0.7405 | 0.694 0 | 0.829 0 | 0.826 1 | 0.8536 | 0.826 1 |
D3 | 0.646 8 | 0.646 8 | 0.651 5 | 0.6561 | 0.475 7 | 0.494 7 | 0.468 0 | 0.4988 | 0.526 3 | 0.525 8 | 0.530 9 | 0.5322 |
D4 | 0.8333 | 0.811 1 | 0.814 8 | 0.800 0 | 0.8299 | 0.805 7 | 0.811 9 | 0.795 4 | 0.8333 | 0.811 1 | 0.814 8 | 0.800 0 |
D5 | 0.822 5 | 0.826 0 | 0.8290 | 0.819 0 | 0.701 7 | 0.7043 | 0.703 8 | 0.695 6 | 0.891 6 | 0.893 7 | 0.8964 | 0.889 5 |
D6 | — | 0.690 6 | 0.717 8 | 0.7513 | — | 0.445 3 | 0.459 4 | 0.4909 | — | 0.690 6 | 0.7731 | 0.717 8 |
D7 | 0.947 3 | 0.940 2 | 0.9526 | 0.947 3 | 0.940 6 | 0.937 1 | 0.9476 | 0.940 7 | 0.947 3 | 0.943 8 | 0.9526 | 0.947 3 |
D8 | 0.663 3 | 0.6887 | 0.678 6 | 0.658 6 | 0.361 1 | 0.3855 | 0.381 2 | 0.375 5 | 0.663 3 | 0.6887 | 0.678 6 | 0.658 6 |
D9 | — | 0.425 8 | 0.4556 | 0.423 2 | — | 0.213 3 | 0.2173 | 0.213 1 | — | 0.630 1 | 0.6528 | 0.628 1 |
D10 | 0.860 0 | 0.820 5 | 0.8900 | 0.880 0 | 0.686 0 | 0.666 9 | 0.802 0 | 0.8065 | 0.710 3 | 0.8754 | 0.730 1 | 0.722 4 |
数据集 | micro-F1 | macro-F1 | G-Mean | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
CFS | PLCFS | mRMR | Relief | CFS | PLCFS | mRMR | Relief | CFS | PLCFS | mRMR | Relief | |
D1 | 0.566 1 | 0.539 7 | 0.661 4 | 0.550 6 | 0.210 5 | 0.184 8 | 0.336 3 | 0.222 4 | 0.734 6 | 0.716 3 | 0.798 5 | 0.720 7 |
D2 | 0.810 1 | 0.821 7 | 0.834 8 | 0.826 4 | 0.691 9 | 0.706 1 | 0.717 9 | 0.713 2 | 0.810 1 | 0.821 7 | 0.834 8 | 0.797 1 |
D3 | 0.646 8 | 0.698 7 | 0.651 5 | 0.654 3 | 0.499 1 | 0.536 9 | 0.616 0 | 0.568 6 | 0.523 4 | 0.556 2 | 0.323 0 | 0.360 9 |
D4 | 0.814 8 | 0.818 5 | 0.814 8 | 0.813 3 | 0.807 9 | 0.812 4 | 0.809 4 | 0.807 5 | 0.814 8 | 0.818 5 | 0.814 8 | 0.814 8 |
D5 | 0.829 9 | 0.833 3 | 0.843 7 | 0.833 2 | 0.723 4 | 0.724 9 | 0.734 8 | 0.723 9 | 0.896 4 | 0.898 6 | 0.904 9 | 0.894 7 |
D6 | — | 0.593 2 | 0.498 4 | 0.560 3 | — | 0.362 6 | 0.322 2 | 0.329 6 | — | 0.593 2 | 0.498 4 | 0.745 0 |
D7 | 0.935 0 | 0.943 8 | 0.938 5 | 0.935 0 | 0.926 2 | 0.937 3 | 0.930 5 | 0.924 2 | 0.935 0 | 0.943 8 | 0.938 5 | 0.938 5 |
D8 | 0.633 2 | 0.648 5 | 0.658 8 | 0.641 1 | 0.363 9 | 0.357 4 | 0.378 7 | 0.369 3 | 0.633 2 | 0.648 5 | 0.658 8 | 0.593 6 |
D9 | — | 0.444 1 | 0.444 1 | 0.444 1 | — | 0.279 4 | 0.315 0 | 0.287 3 | — | 0.643 8 | 0.643 8 | 0.641 7 |
D10 | 0.940 0 | 0.880 5 | 0.920 0 | 0.900 0 | 0.857 4 | 0.650 0 | 0.789 8 | 0.755 5 | 0.763 4 | 0.919 7 | 0.749 5 | 0.762 5 |
Tab. 5 Index scores of 10 datasets under RF classifier
数据集 | micro-F1 | macro-F1 | G-Mean | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
CFS | PLCFS | mRMR | Relief | CFS | PLCFS | mRMR | Relief | CFS | PLCFS | mRMR | Relief | |
D1 | 0.566 1 | 0.539 7 | 0.661 4 | 0.550 6 | 0.210 5 | 0.184 8 | 0.336 3 | 0.222 4 | 0.734 6 | 0.716 3 | 0.798 5 | 0.720 7 |
D2 | 0.810 1 | 0.821 7 | 0.834 8 | 0.826 4 | 0.691 9 | 0.706 1 | 0.717 9 | 0.713 2 | 0.810 1 | 0.821 7 | 0.834 8 | 0.797 1 |
D3 | 0.646 8 | 0.698 7 | 0.651 5 | 0.654 3 | 0.499 1 | 0.536 9 | 0.616 0 | 0.568 6 | 0.523 4 | 0.556 2 | 0.323 0 | 0.360 9 |
D4 | 0.814 8 | 0.818 5 | 0.814 8 | 0.813 3 | 0.807 9 | 0.812 4 | 0.809 4 | 0.807 5 | 0.814 8 | 0.818 5 | 0.814 8 | 0.814 8 |
D5 | 0.829 9 | 0.833 3 | 0.843 7 | 0.833 2 | 0.723 4 | 0.724 9 | 0.734 8 | 0.723 9 | 0.896 4 | 0.898 6 | 0.904 9 | 0.894 7 |
D6 | — | 0.593 2 | 0.498 4 | 0.560 3 | — | 0.362 6 | 0.322 2 | 0.329 6 | — | 0.593 2 | 0.498 4 | 0.745 0 |
D7 | 0.935 0 | 0.943 8 | 0.938 5 | 0.935 0 | 0.926 2 | 0.937 3 | 0.930 5 | 0.924 2 | 0.935 0 | 0.943 8 | 0.938 5 | 0.938 5 |
D8 | 0.633 2 | 0.648 5 | 0.658 8 | 0.641 1 | 0.363 9 | 0.357 4 | 0.378 7 | 0.369 3 | 0.633 2 | 0.648 5 | 0.658 8 | 0.593 6 |
D9 | — | 0.444 1 | 0.444 1 | 0.444 1 | — | 0.279 4 | 0.315 0 | 0.287 3 | — | 0.643 8 | 0.643 8 | 0.641 7 |
D10 | 0.940 0 | 0.880 5 | 0.920 0 | 0.900 0 | 0.857 4 | 0.650 0 | 0.789 8 | 0.755 5 | 0.763 4 | 0.919 7 | 0.749 5 | 0.762 5 |
数据集 | micro-F1 | macro-F1 | G-Mean | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
CFS | PLCFS | mRMR | Relief | CFS | PLCFS | mRMR | Relief | CFS | PLCFS | mRMR | Relief | |
D1 | 0.416 0 | 0.433 5 | 0.661 5 | 0.444 5 | 0.174 1 | 0.205 7 | 0.314 1 | 0.203 1 | 0.624 7 | 0.640 3 | 0.798 8 | 0.648 6 |
D2 | 0.791 3 | 0.811 6 | 0.821 7 | 0.800 0 | 0.691 9 | 0.718 7 | 0.696 2 | 0.687 5 | 0.791 3 | 0.811 6 | 0.821 7 | 0.800 0 |
D3 | 0.660 9 | 0.660 9 | 0.656 3 | 0.660 9 | 0.635 6 | 0.635 6 | 0.631 5 | 0.635 6 | 0.333 5 | 0.333 5 | 0.329 0 | 0.333 5 |
D4 | 0.751 9 | 0.785 2 | 0.807 4 | 0.781 5 | 0.747 0 | 0.781 6 | 0.804 4 | 0.778 2 | 0.751 9 | 0.785 2 | 0.807 4 | 0.781 5 |
D5 | 0.830 7 | 0.829 9 | 0.833 3 | 0.825 5 | 0.721 0 | 0.702 2 | 0.724 2 | 0.716 0 | 0.897 2 | 0.897 4 | 0.898 7 | 0.893 7 |
D6 | — | 0.565 5 | 0.557 1 | 0.758 6 | — | 0.377 4 | 0.375 2 | 0.491 3 | — | 0.565 5 | 0.557 1 | 0.758 6 |
D7 | 0.922 6 | 0.919 2 | 0.913 9 | 0.936 7 | 0.913 2 | 0.910 6 | 0.903 9 | 0.928 5 | 0.922 6 | 0.919 2 | 0.913 9 | 0.936 7 |
D8 | 0.533 2 | 0.542 9 | 0.573 1 | 0.517 8 | 0.339 0 | 0.328 5 | 0.336 2 | 0.315 6 | 0.533 2 | 0.542 9 | 0.573 1 | 0.517 8 |
D9 | — | 0.450 2 | 0.452 2 | 0.447 5 | — | 0.253 8 | 0.310 0 | 0.213 9 | — | 0.648 7 | 0.650 2 | 0.646 6 |
D10 | 0.960 0 | 0.890 5 | 0.930 0 | 0.960 0 | 0.924 4 | 0.700 0 | 0.862 9 | 0.894 5 | 0.774 7 | 0.923 7 | 0.753 5 | 0.775 5 |
Tab. 6 Index scores of 10 datasets under DT classifier
数据集 | micro-F1 | macro-F1 | G-Mean | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
CFS | PLCFS | mRMR | Relief | CFS | PLCFS | mRMR | Relief | CFS | PLCFS | mRMR | Relief | |
D1 | 0.416 0 | 0.433 5 | 0.661 5 | 0.444 5 | 0.174 1 | 0.205 7 | 0.314 1 | 0.203 1 | 0.624 7 | 0.640 3 | 0.798 8 | 0.648 6 |
D2 | 0.791 3 | 0.811 6 | 0.821 7 | 0.800 0 | 0.691 9 | 0.718 7 | 0.696 2 | 0.687 5 | 0.791 3 | 0.811 6 | 0.821 7 | 0.800 0 |
D3 | 0.660 9 | 0.660 9 | 0.656 3 | 0.660 9 | 0.635 6 | 0.635 6 | 0.631 5 | 0.635 6 | 0.333 5 | 0.333 5 | 0.329 0 | 0.333 5 |
D4 | 0.751 9 | 0.785 2 | 0.807 4 | 0.781 5 | 0.747 0 | 0.781 6 | 0.804 4 | 0.778 2 | 0.751 9 | 0.785 2 | 0.807 4 | 0.781 5 |
D5 | 0.830 7 | 0.829 9 | 0.833 3 | 0.825 5 | 0.721 0 | 0.702 2 | 0.724 2 | 0.716 0 | 0.897 2 | 0.897 4 | 0.898 7 | 0.893 7 |
D6 | — | 0.565 5 | 0.557 1 | 0.758 6 | — | 0.377 4 | 0.375 2 | 0.491 3 | — | 0.565 5 | 0.557 1 | 0.758 6 |
D7 | 0.922 6 | 0.919 2 | 0.913 9 | 0.936 7 | 0.913 2 | 0.910 6 | 0.903 9 | 0.928 5 | 0.922 6 | 0.919 2 | 0.913 9 | 0.936 7 |
D8 | 0.533 2 | 0.542 9 | 0.573 1 | 0.517 8 | 0.339 0 | 0.328 5 | 0.336 2 | 0.315 6 | 0.533 2 | 0.542 9 | 0.573 1 | 0.517 8 |
D9 | — | 0.450 2 | 0.452 2 | 0.447 5 | — | 0.253 8 | 0.310 0 | 0.213 9 | — | 0.648 7 | 0.650 2 | 0.646 6 |
D10 | 0.960 0 | 0.890 5 | 0.930 0 | 0.960 0 | 0.924 4 | 0.700 0 | 0.862 9 | 0.894 5 | 0.774 7 | 0.923 7 | 0.753 5 | 0.775 5 |
数据集 | micro-F1 | macro-F1 | G-Mean | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
CFS | PLCFS | mRMR | Relief | CFS | PLCFS | mRMR | Relief | CFS | PLCFS | mRMR | Relief | |
D1 | 0.577 3 | 0.575 1 | 0.661 6 | 0.581 7 | 0.191 8 | 0.194 5 | 0.279 9 | 0.246 4 | 0.742 8 | 0.739 9 | 0.799 0 | 0.744 8 |
D2 | 0.839 1 | 0.837 7 | 0.843 5 | 0.834 8 | 0.719 2 | 0.717 6 | 0.711 5 | 0.698 0 | 0.839 1 | 0.837 7 | 0.843 5 | 0.834 8 |
D3 | 0.637 4 | 0.642 2 | 0.642 1 | 0.646 8 | 0.456 1 | 0.472 7 | 0.472 5 | 0.465 5 | 0.515 2 | 0.521 4 | 0.515 9 | 0.526 1 |
D4 | 0.829 6 | 0.829 6 | 0.844 4 | 0.833 3 | 0.826 3 | 0.826 0 | 0.841 7 | 0.830 0 | 0.829 6 | 0.829 6 | 0.844 4 | 0.833 3 |
D5 | 0.801 7 | 0.801 7 | 0.826 0 | 0.819 5 | 0.689 0 | 0.682 4 | 0.703 6 | 0.697 3 | 0.878 9 | 0.877 1 | 0.894 4 | 0.890 3 |
D6 | — | 0.595 6 | 0.595 6 | 0.553 8 | — | 0.395 5 | 0.395 5 | 0.366 0 | — | 0.595 6 | 0.595 6 | 0.553 8 |
D7 | 0.945 5 | 0.950 8 | 0.943 8 | 0.943 7 | 0.938 7 | 0.944 8 | 0.936 5 | 0.936 3 | 0.945 5 | 0.950 8 | 0.943 8 | 0.943 7 |
D8 | 0.693 3 | 0.653 3 | 0.633 3 | 0.703 7 | 0.369 6 | 0.370 9 | 0.352 5 | 0.373 5 | 0.693 3 | 0.653 3 | 0.633 3 | 0.703 7 |
D9 | — | 0.475 8 | 0.480 5 | 0.476 4 | — | 0.235 0 | 0.299 8 | 0.235 7 | — | 0.668 0 | 0.671 5 | 0.668 5 |
D10 | 0.930 0 | 0.860 5 | 0.930 0 | 0.930 0 | 0.852 4 | 0.711 3 | 0.847 9 | 0.871 0 | 0.757 0 | 0.905 2 | 0.757 0 | 0.755 9 |
Tab. 7 Index scores of 10 datasets under LR classifier
数据集 | micro-F1 | macro-F1 | G-Mean | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
CFS | PLCFS | mRMR | Relief | CFS | PLCFS | mRMR | Relief | CFS | PLCFS | mRMR | Relief | |
D1 | 0.577 3 | 0.575 1 | 0.661 6 | 0.581 7 | 0.191 8 | 0.194 5 | 0.279 9 | 0.246 4 | 0.742 8 | 0.739 9 | 0.799 0 | 0.744 8 |
D2 | 0.839 1 | 0.837 7 | 0.843 5 | 0.834 8 | 0.719 2 | 0.717 6 | 0.711 5 | 0.698 0 | 0.839 1 | 0.837 7 | 0.843 5 | 0.834 8 |
D3 | 0.637 4 | 0.642 2 | 0.642 1 | 0.646 8 | 0.456 1 | 0.472 7 | 0.472 5 | 0.465 5 | 0.515 2 | 0.521 4 | 0.515 9 | 0.526 1 |
D4 | 0.829 6 | 0.829 6 | 0.844 4 | 0.833 3 | 0.826 3 | 0.826 0 | 0.841 7 | 0.830 0 | 0.829 6 | 0.829 6 | 0.844 4 | 0.833 3 |
D5 | 0.801 7 | 0.801 7 | 0.826 0 | 0.819 5 | 0.689 0 | 0.682 4 | 0.703 6 | 0.697 3 | 0.878 9 | 0.877 1 | 0.894 4 | 0.890 3 |
D6 | — | 0.595 6 | 0.595 6 | 0.553 8 | — | 0.395 5 | 0.395 5 | 0.366 0 | — | 0.595 6 | 0.595 6 | 0.553 8 |
D7 | 0.945 5 | 0.950 8 | 0.943 8 | 0.943 7 | 0.938 7 | 0.944 8 | 0.936 5 | 0.936 3 | 0.945 5 | 0.950 8 | 0.943 8 | 0.943 7 |
D8 | 0.693 3 | 0.653 3 | 0.633 3 | 0.703 7 | 0.369 6 | 0.370 9 | 0.352 5 | 0.373 5 | 0.693 3 | 0.653 3 | 0.633 3 | 0.703 7 |
D9 | — | 0.475 8 | 0.480 5 | 0.476 4 | — | 0.235 0 | 0.299 8 | 0.235 7 | — | 0.668 0 | 0.671 5 | 0.668 5 |
D10 | 0.930 0 | 0.860 5 | 0.930 0 | 0.930 0 | 0.852 4 | 0.711 3 | 0.847 9 | 0.871 0 | 0.757 0 | 0.905 2 | 0.757 0 | 0.755 9 |
1 | 李艳霞,柴毅,胡友强,等.不平衡数据分类方法综述[J].控制与决策, 2019, 34(4): 673-688. 10.13195/j.kzyjc.2018.0865 |
LI Y X, CHAI Y, HU Y Q, et al. Review of imbalanced data classification methods[J]. Control and Decision, 2019, 34(4): 673-688. 10.13195/j.kzyjc.2018.0865 | |
2 | HE H B, GARCIA E A. Learning from imbalanced data[J]. IEEE Transactions on Knowledge and Data Engineering, 2009, 21(9): 1263-1284. 10.1109/tkde.2008.239 |
3 | JING X Y, ZHANG X Y, ZHU X K, et al. Multiset feature learning for highly imbalanced data classification[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(1): 139-156. 10.1109/tpami.2019.2929166 |
4 | KHORSHIDI H A, AICKELIN U. Constructing classifiers for imbalanced data using diversity optimization[J]. Information Sciences, 2021, 565: 1-16. 10.1016/j.ins.2021.02.069 |
5 | FU Y G, HUANG H Y, GUAN Y, et al. EBRB cascade classifier for imbalanced data via rule weight updating[J]. Knowledge-Based Systems, 2021, 223: No.107010. 10.1016/j.knosys.2021.107010 |
6 | ZHENG Z H, WU X Y, SRIHARI R. Feature selection for text categorization on imbalanced data[J]. ACM SIGKDD Explorations Newsletter, 2004, 6(1): 80-89. 10.1145/1007730.1007741 |
7 | HULSE J VAN, KHOSHGOFTAAR T M, NAPOLITANO A, et al. Feature selection with high-dimensional imbalanced data [C]// Proceedings of the 2009 IEEE International Conference on Data Mining Workshops. Piscataway: IEEE, 2009: 507-514. 10.1109/icdmw.2009.35 |
8 | WASIKOWSKI M, CHEN X W. Combating the small sample class imbalance problem using feature selection[J]. IEEE Transactions on Knowledge and Data Engineering, 2010, 22(10): 1388-1400. 10.1109/tkde.2009.187 |
9 | MALDONADO S, WEBER R, FAMILI F. Feature selection for high-dimensional class-imbalanced data sets using Support Vector Machines[J]. Information Sciences, 2014, 286: 228-246. 10.1016/j.ins.2014.07.015 |
10 | YIN L Z, GE Y, XIAO K L, et al. Feature selection for high-dimensional imbalanced data[J]. Neurocomputing, 2013, 105: 3-11. 10.1016/j.neucom.2012.04.039 |
11 | FU G H, WU Y J, ZONG M J, et al. Feature selection and classification by minimizing overlap degree for class-imbalanced data in metabolomics[J]. Chemometrics and Intelligent Laboratory Systems, 2020, 196: No.103906. 10.1016/j.chemolab.2019.103906 |
12 | PEDRYCZ W. Granular Computing: Analysis and Design of Intelligent Systems[M]. Boca Raton: CRC Press, 2013: 15-36. 10.1201/b14862 |
13 | YAO Y Y. Three-way granular computing, rough sets, and formal concept analysis[J]. International Journal of Approximate Reasoning, 2020, 116: 106-125. 10.1016/j.ijar.2019.11.002 |
14 | LIU J F, HU Q H, YU D R. A weighted rough set based method developed for class imbalance learning[J]. Information Sciences, 2008, 178(4): 1235-1256. 10.1016/j.ins.2007.10.002 |
15 | ZHOU P, HU X G, LI P P, et al. Online feature selection for high-dimensional class-imbalanced data[J]. Knowledge-Based Systems, 2017, 136: 187-199. 10.1016/j.knosys.2017.09.006 |
16 | CHEN H M, LI T R, FAN X, et al. Feature selection for imbalanced data based on neighborhood rough sets[J]. Information Sciences, 2019, 483: 1-20. 10.1016/j.ins.2019.01.041 |
17 | XUE J H, HALL P. Why does rebalancing class-unbalanced data improve AUC for linear discriminant analysis?[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(5): 1109-1112. 10.1109/tpami.2014.2359660 |
18 | YANG Y Z, XU Z. Rethinking the value of labels for improving class-imbalanced learning[C/OL]// Proceedings of the 34th Conference on Neural Information Processing Systems. [2021-03-28]. . |
19 | YANG X B, LIANG S C, YU H L, et al. Pseudo-label neighborhood rough set: measures and attribute reductions[J]. International Journal of Approximate Reasoning, 2019, 105: 112-129. 10.1016/j.ijar.2018.11.010 |
20 | ZENG W R, CHEN X W, CHENG H. Pseudo labels for imbalanced multi-label learning [C]// Proceedings of the 2014 International Conference on Data Science and Advanced Analytics. Piscataway: IEEE, 2014: 25-31. 10.1109/dsaa.2014.7058047 |
21 | MIAO D Q, ZHAO Y, YAO Y Y, et al. Relative reducts in consistent and inconsistent decision tables of the Pawlak rough set model[J]. Information Sciences, 2009, 179(24): 4140-4150. 10.1016/j.ins.2009.08.020 |
22 | YANG Y Y, CHEN D G, WANG H. Active sample selection based incremental algorithm for attribute reduction with rough sets[J]. IEEE Transactions on Fuzzy Systems, 2017, 25(4): 825-838. 10.1109/tfuzz.2016.2581186 |
23 | PENG H C, LONG F H, DING C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(8): 1226-1238. 10.1109/tpami.2005.159 |
24 | KONONENKO I. Estimating attributes: analysis and extensions of RELIEF [C]// Proceedings of the 1994 European Conference on Machine Learning, LNCS784. Berlin: Springer, 1994: 171-182. |
[1] | Hong CHEN, Bing QI, Haibo JIN, Cong WU, Li’ang ZHANG. Class-imbalanced traffic abnormal detection based on 1D-CNN and BiGRU [J]. Journal of Computer Applications, 2024, 44(8): 2493-2499. |
[2] | Xun YAO, Zhongzheng QIN, Jie YANG. Generative label adversarial text classification model [J]. Journal of Computer Applications, 2024, 44(6): 1781-1785. |
[3] | Mingzhu LEI, Hao WANG, Rong JIA, Lin BAI, Xiaoying PAN. Oversampling algorithm based on synthesizing minority class samples using relationship between features [J]. Journal of Computer Applications, 2024, 44(5): 1428-1436. |
[4] | Lin GAO, Yu ZHOU, Tak Wu KWONG. Evolutionary bi-level adaptive local feature selection [J]. Journal of Computer Applications, 2024, 44(5): 1408-1414. |
[5] | Lin SUN, Menghan LIU. K-means clustering based on adaptive cuckoo optimization feature selection [J]. Journal of Computer Applications, 2024, 44(3): 831-841. |
[6] | Dapeng XU, Xinmin HOU. Feature selection method for graph neural network based on network architecture design [J]. Journal of Computer Applications, 2024, 44(3): 663-670. |
[7] | Shengjie MENG, Wanjun YU, Ying CHEN. Feature selection algorithm for high-dimensional data with maximum correlation and maximum difference [J]. Journal of Computer Applications, 2024, 44(3): 767-771. |
[8] | Jingxin LIU, Wenjing HUANG, Liangsheng XU, Chong HUANG, Jiansheng WU. Unsupervised feature selection model with dictionary learning and sample correlation preservation [J]. Journal of Computer Applications, 2024, 44(12): 3766-3775. |
[9] | Zhaoze GAO, Xiaofei ZHU, Nengqiang XIANG. Semi-supervised stance detection based on category-aware curriculum learning [J]. Journal of Computer Applications, 2024, 44(10): 3281-3287. |
[10] | Tian HE, Zongxin SHEN, Qianqian HUANG, Yanyong HUANG. Adaptive learning-based multi-view unsupervised feature selection method [J]. Journal of Computer Applications, 2023, 43(9): 2657-2664. |
[11] | Lin SUN, Jinxu HUANG, Jiucheng XU. Feature selection for imbalanced data based on neighborhood tolerance mutual information and whale optimization algorithm [J]. Journal of Computer Applications, 2023, 43(6): 1842-1854. |
[12] | Yuanjiang LI, Jinsheng QUAN, Yangyi TAN, Tian YANG. Attribute reduction for high-dimensional data based on bi-view of similarity and difference [J]. Journal of Computer Applications, 2023, 43(5): 1467-1472. |
[13] | Zhenhua YU, Zhengqi LIU, Ying LIU, Cheng GUO. Feature selection method based on self-adaptive hybrid particle swarm optimization for software defect prediction [J]. Journal of Computer Applications, 2023, 43(4): 1206-1213. |
[14] | Lin SUN, Tianjiao MA, Zhan’ao XUE. Multilabel feature selection algorithm based on Fisher score and fuzzy neighborhood entropy [J]. Journal of Computer Applications, 2023, 43(12): 3779-3789. |
[15] | Jingcheng XU, Xuebin CHEN, Yanling DONG, Jia YANG. DDoS attack detection by random forest fused with feature selection [J]. Journal of Computer Applications, 2023, 43(11): 3497-3503. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||