Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (1): 109-114.DOI: 10.11772/j.issn.1001-9081.2021010128
• Data science and technology • Previous Articles Next Articles
Yongbo CHEN, Qiaoqin LI, Yongguo LIU()
Received:
2021-01-25
Revised:
2021-03-29
Accepted:
2021-05-17
Online:
2021-06-04
Published:
2022-01-10
Contact:
Yongguo LIU
About author:
CHEN Yongbo, born in 1995, M. S. candidate. His research interests include cloud computing, big data.Supported by:
通讯作者:
刘勇国
作者简介:
陈永波(1995—),男,内蒙古赤峰人,硕士研究生,主要研究方向:云计算、大数据基金资助:
CLC Number:
Yongbo CHEN, Qiaoqin LI, Yongguo LIU. Dynamic relevance based feature selection algorithm[J]. Journal of Computer Applications, 2022, 42(1): 109-114.
陈永波, 李巧勤, 刘勇国. 基于动态相关性的特征选择算法[J]. 《计算机应用》唯一官方网站, 2022, 42(1): 109-114.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2021010128
MI | CMI | JMI |
---|---|---|
Tab. 2 Amount of information between feature and class
MI | CMI | JMI |
---|---|---|
数据集 | 特征数 | 样本数 | 类别数 | 类型 |
---|---|---|---|---|
lung_discrete | 325 | 73 | 7 | 离散 |
madelon | 500 | 2 600 | 2 | 连续 |
Yale | 1 024 | 165 | 15 | 连续 |
ORL | 1 024 | 400 | 40 | 连续 |
warpAR10P | 2 400 | 130 | 10 | 连续 |
lung | 3 312 | 203 | 5 | 连续 |
lymphoma | 4 026 | 96 | 9 | 离散 |
GLIOMA | 4 434 | 50 | 4 | 连续 |
TOX_171 | 5 748 | 171 | 4 | 连续 |
Prostate_GE | 5 966 | 102 | 2 | 连续 |
leukemia | 7 070 | 72 | 2 | 离散 |
nci9 | 9 712 | 60 | 9 | 离散 |
Tab. 3 Description of experimental datasets
数据集 | 特征数 | 样本数 | 类别数 | 类型 |
---|---|---|---|---|
lung_discrete | 325 | 73 | 7 | 离散 |
madelon | 500 | 2 600 | 2 | 连续 |
Yale | 1 024 | 165 | 15 | 连续 |
ORL | 1 024 | 400 | 40 | 连续 |
warpAR10P | 2 400 | 130 | 10 | 连续 |
lung | 3 312 | 203 | 5 | 连续 |
lymphoma | 4 026 | 96 | 9 | 离散 |
GLIOMA | 4 434 | 50 | 4 | 连续 |
TOX_171 | 5 748 | 171 | 4 | 连续 |
Prostate_GE | 5 966 | 102 | 2 | 连续 |
leukemia | 7 070 | 72 | 2 | 离散 |
nci9 | 9 712 | 60 | 9 | 离散 |
数据集 | JMI | mRMR | MRI | DCSF | MRMD | DRFS |
---|---|---|---|---|---|---|
lung_discrete | 86.25 | 81.79 | 85.18 | 85.00 | 82.50 | 86.43 |
madelon | 85.42 | 68.92 | 85.18 | 72.09 | 71.58 | 85.81 |
Yale | 63.01 | 60.66 | 54.67 | 48.68 | 62.39 | 63.48 |
ORL | 83.25 | 86.00 | 85.50 | 87.50 | 85.75 | 84.75 |
warpAR10P | 78.46 | 78.46 | 67.69 | 68.46 | 76.92 | 83.08 |
lung | 97.02 | 96.07 | 95.02 | 95.12 | 96.60 | 97.02 |
lymphoma | 90.87 | 91.67 | 93.78 | 90.56 | 90.78 | 90.44 |
GLIOMA | 76.00 | 80.00 | 72.00 | 80.00 | 78.00 | 80.00 |
TOX_171 | 70.78 | 70.75 | 73.59 | 71.93 | 73.76 | 74.90 |
Prostate_GE | 94.18 | 94.00 | 94.00 | 92.18 | 94.18 | 94.36 |
leukemia | 95.89 | 95.89 | 97.32 | 97.14 | 96.07 | 97.50 |
nci9 | 55.00 | 56.67 | 50.00 | 46.67 | 56.67 | 56.67 |
Tab. 4 Comparison of classification accuracy of different algorithms on 3NN classifier
数据集 | JMI | mRMR | MRI | DCSF | MRMD | DRFS |
---|---|---|---|---|---|---|
lung_discrete | 86.25 | 81.79 | 85.18 | 85.00 | 82.50 | 86.43 |
madelon | 85.42 | 68.92 | 85.18 | 72.09 | 71.58 | 85.81 |
Yale | 63.01 | 60.66 | 54.67 | 48.68 | 62.39 | 63.48 |
ORL | 83.25 | 86.00 | 85.50 | 87.50 | 85.75 | 84.75 |
warpAR10P | 78.46 | 78.46 | 67.69 | 68.46 | 76.92 | 83.08 |
lung | 97.02 | 96.07 | 95.02 | 95.12 | 96.60 | 97.02 |
lymphoma | 90.87 | 91.67 | 93.78 | 90.56 | 90.78 | 90.44 |
GLIOMA | 76.00 | 80.00 | 72.00 | 80.00 | 78.00 | 80.00 |
TOX_171 | 70.78 | 70.75 | 73.59 | 71.93 | 73.76 | 74.90 |
Prostate_GE | 94.18 | 94.00 | 94.00 | 92.18 | 94.18 | 94.36 |
leukemia | 95.89 | 95.89 | 97.32 | 97.14 | 96.07 | 97.50 |
nci9 | 55.00 | 56.67 | 50.00 | 46.67 | 56.67 | 56.67 |
数据集 | JMI | mRMR | MRI | DCSF | MRMD | DRFS |
---|---|---|---|---|---|---|
lung_discrete | 84.82 | 86.26 | 87.32 | 86.79 | 86.43 | 87.32 |
madelon | 54.19 | 52.73 | 52.85 | 53.88 | 51.08 | 52.12 |
Yale | 65.40 | 66.03 | 65.48 | 62.43 | 67.21 | 67.83 |
ORL | 86.25 | 87.25 | 86.75 | 86.75 | 87.00 | 87.25 |
warpAR10P | 88.46 | 91.54 | 90.00 | 88.48 | 86.15 | 92.31 |
lung | 95.55 | 95.60 | 95.57 | 95.07 | 95.60 | 96.10 |
lymphoma | 89.44 | 91.00 | 92.56 | 92.67 | 91.78 | 92.78 |
GLIOMA | 76.00 | 76.00 | 72.00 | 76.00 | 76.00 | 76.00 |
TOX_171 | 83.69 | 85.36 | 77.78 | 82.35 | 86.05 | 81.90 |
Prostate_GE | 95.09 | 95.27 | 93.09 | 93.18 | 95.18 | 92.09 |
leukemia | 94.46 | 95.71 | 94.45 | 95.17 | 94.29 | 96.07 |
nci9 | 51.67 | 43.33 | 48.33 | 50.00 | 51.67 | 51.67 |
Tab. 5 Comparison of classification accuracy of different algorithms on SVM classifier
数据集 | JMI | mRMR | MRI | DCSF | MRMD | DRFS |
---|---|---|---|---|---|---|
lung_discrete | 84.82 | 86.26 | 87.32 | 86.79 | 86.43 | 87.32 |
madelon | 54.19 | 52.73 | 52.85 | 53.88 | 51.08 | 52.12 |
Yale | 65.40 | 66.03 | 65.48 | 62.43 | 67.21 | 67.83 |
ORL | 86.25 | 87.25 | 86.75 | 86.75 | 87.00 | 87.25 |
warpAR10P | 88.46 | 91.54 | 90.00 | 88.48 | 86.15 | 92.31 |
lung | 95.55 | 95.60 | 95.57 | 95.07 | 95.60 | 96.10 |
lymphoma | 89.44 | 91.00 | 92.56 | 92.67 | 91.78 | 92.78 |
GLIOMA | 76.00 | 76.00 | 72.00 | 76.00 | 76.00 | 76.00 |
TOX_171 | 83.69 | 85.36 | 77.78 | 82.35 | 86.05 | 81.90 |
Prostate_GE | 95.09 | 95.27 | 93.09 | 93.18 | 95.18 | 92.09 |
leukemia | 94.46 | 95.71 | 94.45 | 95.17 | 94.29 | 96.07 |
nci9 | 51.67 | 43.33 | 48.33 | 50.00 | 51.67 | 51.67 |
1 | 张尧. 基于互信息的特征选择方法研究[D]. 西安:西安理工大学, 2019: 1-2. 10.33737/gpps19-bj-117 |
ZHANG Y. Study on feature selection based on mutual information[D]. Xi’an: Xi’an University of Technology, 2019: 1-2. 10.33737/gpps19-bj-117 | |
2 | GAO W F, HU L, ZHANG P. Class-specific mutual information variation for feature selection[J]. Pattern Recognition, 2018, 79: 328-339. 10.1016/j.patcog.2018.02.020 |
3 | SHANNON C E. A mathematical theory of communication[J]. ACM SIGMOBILE Mobile Computing and Communications Review, 2001, 5(1): 3-55. 10.1145/584091.584093 |
4 | COVER T M, THOMAS J A. Elements of Information Theory[M]. 2nd ed. Hoboken, NJ: John Wiley & Sons, Inc., 2006: 13-16. |
5 | JAKULIN A, BRATKO I. Testing the significance of attribute interactions[C]// Proceedings of the 21st International Conference on Machine Learning. New York: ACM, 2004: No.52. 10.1145/1015330.1015377 |
6 | BOMMERT A, SUN X D, BISCHL B, et al. Benchmark for filter methods for feature selection in high-dimensional classification data[J]. Computational Statistics and Data Analysis, 2020, 143: No.106839. 10.1016/j.csda.2019.106839 |
7 | LEWIS D D. Feature selection and feature extraction for text categorization[C]// Proceedings of the 1992 Workshop on Speech and Natural Language. San Francisco: Morgan Kaufmann Publishers Inc., 1992: 212-217. 10.3115/1075527.1075574 |
8 | BATTITI R. Using mutual information for selecting features in supervised neural net learning[J]. IEEE Transactions on Neural Networks, 1994, 5(4): 537-550. 10.1109/72.298224 |
9 | PENG H C, LONG F H, DING C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(8): 1226-1238. 10.1109/tpami.2005.159 |
10 | YANG H H, MOODY J. Data visualization and feature selection: new algorithms for nongaussian data[C]// Proceedings of the 12th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 1999: 687-693. |
11 | 温婧. 基于互信息的动态特征选择算法研究[D]. 西安:西安理工大学, 2020: 13-14. |
WEN J. Research on dynamic feature selection algorithm based on mutual information[D]. Xi’an: Xi’an University of Technology, 2020: 13-14. | |
12 | LIN D H, TANG X O. Conditional infomax learning: an integrated framework for feature extraction and fusion[C]// Proceedings of the 2006 European Conference on Computer Vision, LNCS3951. Berlin: Springer, 2006: 68-82. |
13 | BARRAZA N, MORO S, FERREYRA M, et al. Mutual information and sensitivity analysis for feature selection in customer targeting: a comparative study[J]. Journal of Information Science, 2019, 45(1): 53-67. 10.1177/0165551518770967 |
14 | SUSAN S, HANMANDLU M. Smaller feature subset selection for real-world datasets using a new mutual information with Gaussian gain[J]. Multidimensional Systems and Signal Processing, 2019, 30(3): 1469-1488. 10.1007/s11045-018-0612-2 |
15 | GAO W F, HU L, ZHANG P, et al. Feature selection considering the composition of feature relevancy[J]. Pattern Recognition Letters, 2018, 112: 70-74. 10.1016/j.patrec.2018.06.005 |
16 | ZHOU H F, ZHANG Y, ZHANG Y J, et al. Feature selection based on conditional mutual information: minimum conditional relevance and minimum conditional redundancy[J]. Applied Intelligence, 2019, 49(3): 883-896. 10.1007/s10489-018-1305-0 |
17 | SALEM O A M, LIU F, SHERIF A S, et al. Feature selection based on fuzzy joint mutual information maximization[J]. Mathematical Biosciences and Engineering, 2021, 18(1): 305-327. 10.3934/mbe.2021016 |
18 | ZENG Z L, ZHANG H J, ZHANG R, et al. A novel feature selection method considering feature interaction[J]. Pattern Recognition, 2015, 48(8): 2656-2666. 10.1016/j.patcog.2015.02.025 |
19 | SHA Z C, LIU Z M, MA C, et al. Feature selection for multi-label classification by maximizing full-dimensional conditional mutual information[J]. Applied Intelligence, 2021, 51(1): 326-340. 10.1007/s10489-020-01822-0 |
20 | WANG J, WEI J M, YANG Z L, et al. Feature selection by maximizing independent classification Information[J]. IEEE Transactions on Knowledge and Data Engineering, 2017, 29(4): 828-841. 10.1109/tkde.2017.2650906 |
21 | GAO W F, HU L, ZHANG P. Feature redundancy term variation for mutual information-based feature selection[J]. Applied Intelligence, 2020, 50(4): 1272-1288.. 10.1007/s10489-019-01597-z |
[1] | Shunyong LI, Shiyi LI, Rui XU, Xingwang ZHAO. Incomplete multi-view clustering algorithm based on self-attention fusion [J]. Journal of Computer Applications, 2024, 44(9): 2696-2703. |
[2] | Hong CHEN, Bing QI, Haibo JIN, Cong WU, Li’ang ZHANG. Class-imbalanced traffic abnormal detection based on 1D-CNN and BiGRU [J]. Journal of Computer Applications, 2024, 44(8): 2493-2499. |
[3] | Dongju YANG, Chengfu HU. Keyword extraction method for scientific text based on improved TextRank [J]. Journal of Computer Applications, 2024, 44(6): 1720-1726. |
[4] | Mingzhu LEI, Hao WANG, Rong JIA, Lin BAI, Xiaoying PAN. Oversampling algorithm based on synthesizing minority class samples using relationship between features [J]. Journal of Computer Applications, 2024, 44(5): 1428-1436. |
[5] | Lin GAO, Yu ZHOU, Tak Wu KWONG. Evolutionary bi-level adaptive local feature selection [J]. Journal of Computer Applications, 2024, 44(5): 1408-1414. |
[6] | Lin SUN, Menghan LIU. K-means clustering based on adaptive cuckoo optimization feature selection [J]. Journal of Computer Applications, 2024, 44(3): 831-841. |
[7] | Dapeng XU, Xinmin HOU. Feature selection method for graph neural network based on network architecture design [J]. Journal of Computer Applications, 2024, 44(3): 663-670. |
[8] | Shengjie MENG, Wanjun YU, Ying CHEN. Feature selection algorithm for high-dimensional data with maximum correlation and maximum difference [J]. Journal of Computer Applications, 2024, 44(3): 767-771. |
[9] | Jingxin LIU, Wenjing HUANG, Liangsheng XU, Chong HUANG, Jiansheng WU. Unsupervised feature selection model with dictionary learning and sample correlation preservation [J]. Journal of Computer Applications, 2024, 44(12): 3766-3775. |
[10] | Du CHEN, Yuanyuan LI, Yu CHEN. Directed gene regulatory network inference algorithm based on t-test and stepwise network search [J]. Journal of Computer Applications, 2024, 44(1): 199-205. |
[11] | Tian HE, Zongxin SHEN, Qianqian HUANG, Yanyong HUANG. Adaptive learning-based multi-view unsupervised feature selection method [J]. Journal of Computer Applications, 2023, 43(9): 2657-2664. |
[12] | Hanchen LI, Shunxiang ZHANG, Guangli ZHU, Tengke WANG. Chinese homophonic neologism discovery method based on Pinyin similarity [J]. Journal of Computer Applications, 2023, 43(9): 2715-2720. |
[13] | Jinghuan LAO, Dong HUANG, Changdong WANG, Jianhuang LAI. Multi-view ensemble clustering algorithm based on view-wise mutual information weighting [J]. Journal of Computer Applications, 2023, 43(6): 1713-1718. |
[14] | Lin SUN, Jinxu HUANG, Jiucheng XU. Feature selection for imbalanced data based on neighborhood tolerance mutual information and whale optimization algorithm [J]. Journal of Computer Applications, 2023, 43(6): 1842-1854. |
[15] | Jin XIA, Zhengqun WANG, Shiming ZHU. Traffic flow prediction model based on time series decomposition [J]. Journal of Computer Applications, 2023, 43(4): 1129-1135. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||