|
Improved feature selection and classification algorithm for gene expression programming based on layer distance
ZHAN Hang, HE Lang, HUANG Zhangcan, LI Huafeng, ZHANG Qiang, TAN Qing
Journal of Computer Applications
2021, 41 (9):
2658-2667.
DOI: 10.11772/j.issn.1001-9081.2020111801
Concerning the problem that the interpretable mapping relationship between data features and data categories do not be revealed by general feature selection algorithms. on the basis of Gene Expression Programming (GEP),by introducing the initialization methods, mutation strategies and fitness evaluation methods,an improved Feature Selection classification algorithm based on Layer Distance for GEP(FSLDGEP) was proposed. Firstly,the selection probability was defined to initialize the individuals in the population directionally, so as to increase the number of effective individuals in the population. Secondly, the layer neighborhood of the individual was proposed, so that each individual in the population would mutate based on its layer neighborhood, and the blind and unguided problem in the process of mutation was solved。Finally, the dimension reduction rate and classification accuracy were combined as the fitness value of the individual, which changed the population evolutionary mode of single optimization goal and balanced the relationship between the above two. The 5-fold and 10-fold verifications were performed on 7 datasets, the functional mapping relationship between data features and their categories was given by the proposed algorithm, and the obtained mapping function was used for data classification. Compared with Feature Selection based on Forest Optimization Algorithm (FSFOA), feature evaluation and selection based on Neighborhood Soft Margin (NSM), Feature Selection based on Neighborhood Effective Information Ratio (FS-NEIR)and other comparison algorithms, the proposed algorithm has obtained the best results of the dimension reduction rate on Hepatitis, Wisconsin Prognostic Breast Cancer (WPBC), Sonar and Wisconsin Diagnostic Breast Cancer (WDBC) datasets, and has the best average classification accuracy on Hepatitis, Ionosphere, Musk1, WPBC, Heart-Statlog and WDBC datasets. Experimental results shows that the feasibility, effectiveness and superiority of the proposed algorithm in feature selection and classification are verified.
Reference |
Related Articles |
Metrics
|
|