Journal of Computer Applications ›› 2017, Vol. 37 ›› Issue (9): 2433-2438.DOI: 10.11772/j.issn.1001-9081.2017.09.2433
WANG Xiang1,2, HU Xuegang1
Received:
2017-03-27
Revised:
2017-04-21
Online:
2017-09-13
Published:
2017-09-10
Supported by:
This work is partially supported by the National Basic Research Program (973 Program) of China (2016YFC0801406), the National Natural Science Foundation of China (61673152), the Natural Science Foundation of Anhui Province (1408085QF136).
王翔1,2, 胡学钢1
通讯作者:
王翔,wangxiang@ahinfo.gov.cn
作者简介:
王翔(1982-),男,安徽合肥人,博士研究生,主要研究方向:数据挖掘、人工智能、情报分析;胡学钢(1962-),男,安徽合肥人,教授,博士,主要研究方向:数据挖掘、人工智能、大数据分析。
基金资助:
国家973计划项目(2016YFC0801406);国家自然科学基金资助项目(61673152);安徽省自然科学基金资助项目(1408085QF136)。
CLC Number:
WANG Xiang, HU Xuegang. Overview on feature selection in high-dimensional and small-sample-size classification[J]. Journal of Computer Applications, 2017, 37(9): 2433-2438.
王翔, 胡学钢. 高维小样本分类问题中特征选择研究综述[J]. 计算机应用, 2017, 37(9): 2433-2438.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2017.09.2433
[1] ESPEZUA S, VILLANUEVA E, MACIEL C D, et al. A projection pursuit framework for supervised dimension reduction of high dimensional small sample datasets[J]. Neurocomputing, 2015, 149(PB):767-776. [2] LAZAR C, TAMINAU J, MEGANCK S, et al. A survey on filter techniques for feature selection in gene expression microarray analysis[J]. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2012, 9(4):1106-1119. [3] TAO H, BAUSCH C, RICHMOND C, et al. Functional genomics:expression analysis of Escherichia coli growing on minimal and rich media[J]. Journal of Bacteriology, 1999, 181(20):6425-6440. [4] KERR M K, MARTIN M, CHURCHILL G A. Analysis of variance for gene expression microarray data[J]. Journal of Computational Biology, 2000, 7(6):819-837. [5] THOMAS J G, OLSON J M, TAPSCOTT S J, et al. An efficient and robust statistical modeling approach to discover differentially expressed genes using genomic expression profiles[J]. Genome Research, 2001, 11(7):1227-1236. [6] EFRON B, TIBSHIRANI R, STOREY J D, et al. Empirical Bayes analysis of a microarray experiment[J]. Journal of the American Statistical Association, 2001, 96(456):1151-1160. [7] LONG A D, MANGALAM H J, CHAN B Y, et al. Improved statistical inference from DNA microarray data using analysis of variance and a Bayesian statistical framework[J]. Journal of Biological Chemistry, 2001, 276(23):19937-19944. [8] BALDI P, LONG A D. A Bayesian framework for the analysis of microarray expression data:regularized t-test and statistical inferences of gene changes[J]. Bioinformatics, 2001, 17(6):509-519. [9] PARZEN E. On estimation of a probability density function and mode[J]. The Annals of Mathematical Statistics, 1962, 33(3):1065-1076. [10] WILINSKI A, OSOWSKI S, SIWEK K. Gene selection for cancer classification through ensemble of methods[C]//Proceedings of the 9th International Conference on Adaptive and Natural Computing Algorithms. Berlin:Springer, 2009:507-516. [11] STEUER R, KURTHS J, DAUB C O, et al. The mutual information:detecting and evaluating dependencies between variables[J]. Bioinformatics, 2002, 18(Suppl. 2):S231-S240. [12] LIU X, KRISHNAN A, MONDRY A. An entropy-based gene selection method for cancer classification using microarray data[J]. BMC Bioinformatics, 2005, 6(1):1-14. [13] CHUANG L Y, KE C H, CHANG H W, et al. A two-stage feature selection method for gene expression data[J]. Omics:a Journal of Integrative Biology, 2009, 13(2):127-137. [14] GOLUB T R, SLONIM D K, TAMAYO P, et al. Molecular classification of cancer:class discovery and class prediction by gene expression monitoring[J]. Brain Research, 1999, 501(2):205-214. [15] 李颖新,李建更,阮晓钢.肿瘤基因表达谱分类特征基因选取问题及分析方法研究[J].计算机学报,2006,29(2):324-330.(LI Y X, LI J G, RUAN X G. Study of informative gene selection for tissue classification based on tumor gene expression profiles[J]. Chinese Journal of Computers, 2006, 29(2):324-330.) [16] VAN'T VEER L J, DAI H, VAN DE VIJVER M J, et al. Gene expression profiling predicts clinical outcome of breast cancer[J]. Nature, 2002, 415(6871):530-536. [17] PARK P J, PAGANO M, BONETTI M. A nonparametric scoring algorithm for identifying informative genes from microarray data[EB/OL].[2016-12-17]. http://xueshu.baidu.com/s?wd=paperuri%3A%286c6a741e996db71f799147979ac19d70%29&filter=sc_long_sign&tn=SE_xueshusource_2kduw22v&sc_vurl=http%3A%2F%2Fdx.doi.org%2F10.1142%2F9789814447362_0006&ie=utf-8&sc_us=5571940567161427371. [18] CHENG Q, ZHOU H, CHENG J. The Fisher-Markov selector:fast selecting maximally separable feature subset for multiclass classification with applications to high-dimensional data[J]. IEEE Transactions on Pattern Analysis and Machine intelligence, 2011, 33(6):1217-1233. [19] WANG Y, TETKO I V, HALL M A, et al. Gene selection from microarray data for cancer classification-a machine learning approach[J]. Computational Biology & Chemistry, 2005, 29(1):37-46. [20] DING C, PENG H. Minimum redundancy feature selection from microarray gene expression data[J]. Journal of Bioinformatics and Computational Biology, 2005, 3(2):185-205. [21] XING E P, JORDAN M I, KARP R M. Feature selection for high-dimensional genomic microarray data[C]//Proceedings of the 18th International Conference on Machine Learning. San Francisco, CA:Morgan Kaufmann, 2001:601-608. [22] HIRA Z M, AGILLIES D F. A review of feature selection and feature extraction methods applied on microarray data[J]. Advances in Bioinformatics, 2015, 2015:Article ID 198363. [23] LI L, WEINBERG C R, DARDEN T A, et al. Gene selection for sample classification based on gene expression data:study of sensitivity to choice of parameters of the GA/KNN method[J]. Bioinformatics, 2001, 17(12):1131-1142. [24] CHANDRASHEKAR G, SAHIN F. A survey on feature selection methods[J]. Computers & Electrical Engineering, 2014, 40(1):16-28. [25] XIA X L, XING H, LIU X. Analyzing kernel matrices for the identification of differentially expressed genes[J]. PLOS ONE, 2013, 8(12):e81683. [26] OSAREH A, SHADGAR B. Machine learning techniques to diagnose breast cancer[C]//Proceedings of the 20105th International Symposium on Health Informatics and Bioinformatics. Piscataway, NJ:IEEE, 2010:114-120. [27] 张靖.面向高维小样本数据的分类特征选择算法研究[D].合肥:合肥工业大学,2014:15,35-52.(ZHANG J. Classification and feature selection on high-dimensional and small-sampling data[D]. Hefei:Hefei University of Technology, 2014:15,35-52.) [28] SUN Y, BABBS C F, DELP E J. A comparison of feature selection methods for the detection of breast cancers in mammograms:adaptive sequential floating search vs. genetic algorithm[C]//Proceedings of the 27th Annual International Conference of the Engineering in Medicine and Biology Society. Piscataway, NJ:IEEE, 2006:6532-6535. [29] NAKARIYAKUL S, CASASENT D P. An improvement on floating search algorithms for feature subset selection[J]. Pattern Recognition, 2009, 42(9):1932-1940. [30] CHUANG L Y, YANG C H, LI J C, et al. A hybrid BPSO-CGA approach for gene selection and classification of microarray data[J]. Journal of Computational Biology:A Journal of Computational Molecular Cell Biology, 2012, 19(1):68-82. [31] CORDÓN O, DAMAS S, SANTAMARÍA J. Feature-based image registration by means of the CHC evolutionary algorithm[J]. Image & Vision Computing, 2006, 24(5):525-533. [32] KAMYAB S, EFTEKHARI M. Feature selection using multimodal optimization techniques[J]. Neurocomputing, 2016, 171(C):586-597. [33] GUYON I, WESTON J, BARNHILL S, et al. Gene selection for caner classification using support vector machines[J]. Machine Learning, 2002, 46(1):389-422. [34] DING Y, WILKINS D. Improving the performance of SVM-RFE to select genes in microarray data[J]. BMC Bioinformatics, 2006, 7(Suppl 2):S12. [35] MAO Y, PI D, LIU Y, et al. Accelerated recursive feature elimination based on support vector machine for key variable identification[J]. Chinese Journal of Chemical Engineering, 2006, 14(1):65-72. [36] 谢娟英,谢维信.基于特征子集区分度与支持向量机的特征选择算法[J].计算机学报,2014,37(8):1704-1718.(XIE J Y, XIE W X. Several feature selection algorithms based on the discernibility of a feature subset and support vector machines[J]. Chinese Journal of Computers, 2014, 37(8):1704-1718.) [37] 游伟,李树涛,谭明奎.基于SVM-RFE-SFS的基因选择方法[J].中国生物医学工程学报,2010,29(1):93-99.(YOU W, LI S T, TAN M K. Gene selection method based on SVM-RFE-SFS[J]. Chinese Journal of Biomedical Engineering, 2010, 29(1):93-99.) [38] TANG Y, ZHANG Y Q, HUANG Z. FCM-SVM-RFE gene feature selection algorithm for leukemia classification from microarray gene expression data[C]//Proceedings of the 14th IEEE International Conference on Fuzzy Systems. Piscataway, NJ:IEEE, 2005:97-101. [39] 吴红霞,吴悦,刘宗田,等.基于Relief和SVM-RFE的组合式SNP特征选择[J].计算机应用研究,2012,29(6):2074-2077.(WU H X, WU Y, LIU Z T, et al. Combined SNP feature selection based on Relief and SVM-RFE[J]. Application Research of Computers, 2012, 29(6):2074-2077.) [40] 林俊,许露,刘龙.基于SVM-RFE-BPSO算法的特征选择方法[J].小型微型计算机系统,2015,36(8):1865-1868.(LIN J, XU L, LIU L. Feature selection method based on SVM-RFE and particle swarm optimization[J]. Journal of Chinese Computer Systems, 2015, 36(8):1865-1868.) [41] TIBSHIRANI R. Regression shrinkage and selection via the Lasso[J]. Journal of the Royal Statistical Society, 1996, 58(1):267-288. [42] 刘建伟,崔立鹏,刘泽宇,等.正则化稀疏模型[J].计算机学报,2015, 38(7):1307-1325. (LIU J W, CUI L P,LIU Z Y, et al. Survey on the regularized sparse models[J]. Chinese Journal of Computers. 2015, 38(7):1307-1325.) [43] 刘建伟,崔立鹏,罗雄麟. 结构稀疏模型及其算法研究进展[J].计算机科学,2016,43(S1):1-16.(LIU J W, CUI L P, LUO X L. Research and development on structured sparse models and algorithms[J]. Computer Science, 2016, 43(S1):1-16.) [44] EFRON B, HASTIE T, JOHNSTONE I, et al. Least angle regression[J]. Annals of Statistics, 2004, 32(2):407-451. [45] 张靖,胡学钢,张玉红,等.K-split Lasso:有效的肿瘤特征基因选择方法[J].计算机科学与探索,2012,6(12):1136-1143.(ZHANG J, HU X G, ZHANG Y H, et al. K-split Lasso:an effective feature selection method for tumor gene expression data[J]. Journal of Frontiers of Computer Science and Technology, 2012, 6(12):1136-1143.) [46] 施万锋,胡学钢,俞奎.一种面向高维数据的均分式Lasso特征选择方法[J].计算机工程与应用,2012,48(1):157-161.(SHI W F, HU X G, YU K. K-part Lasso based on feature selection algorithm for high-dimensional data[J]. Computer Engineering and Applications, 2012, 48(1):157-161.) [47] 施万锋,胡学钢,俞奎. 一种面向高维数据的迭代式Lasso特征选择方法[J]. 计算机应用研究,2011,28(12):4463-4466.(SHI W F, HU X G, YU K. Iterative Lasso based on feature selection for high dimensional data[J]. Application Research of Computers, 2011, 28(12):4463-4466.) [48] ZOU H, HASTIE T. Regularization and variable selection via the elastic net[J]. Journal of the Royal Statistical Society, 2005, 67(2):301-320. [49] LUO S, CHEN Z. Sequential Lasso cum EBIC for feature selection with ultra-high dimensional feature space[J]. Journal of the American Statistical Association, 2014, 109(507):1229-1240. [50] CHEN Z H. Sequential Lasso for feature selection with ultra-high dimensional feature space[EB/OL].[2016-11-25]. http://www.stat.nus.edu.sg/~stachenz/T11-455R1.pdf. [51] MA S, SONG X, HUANG J. Supervised group Lasso with applications to microarray data analysis[J]. BMC Bioinformatics, 2007, 8(1):1-17. [52] LI X, RAO S, WANG Y, et al. Gene mining:a novel and powerful ensemble decision approach to hunting for disease genes using microarray expression profiling[J]. Nucleic Acids Research, 2004, 32(9):2685-2694. [53] DUTKOWSKI J, GAMBIN A. On consensus biomarker selection[J]. BMC Bioinformatics, 2007, 8(Suppl 5):S5. [54] SAEYS Y, ABEEL T, PEER Y V D. Robust feature selection using ensemble feature selection techniques[C]//Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases, LNCS 5212. Berlin:Springer, 2008:313-325. [55] ABEEL T, HELLEPUTTE T, VAN DE PEER Y, et al. Robust biomarker identification for cancer diagnosis with ensemble feature selection methods[J]. Bioinformatics, 2010, 26(3):392-398. [56] WANG Y, MAKEDON F S, FORD J C, et al. HykGene:a hybrid approach for selecting marker genes for phenotype classification using microarray gene expression data[J]. Bioinformatics, 2005, 21(8):1530-1537. [57] AKADI A E, AMINE A, OUARDIGHI A E, et al. A two-stage gene selection scheme utilizing MRMR filter and GA wrapper[J]. Knowledge and Information Systems, 2011, 26(3):487-500. [58] BERMEJO P, DE LA OSSA L, GÁMEZ J A, et al. Fast wrapper feature subset selection in high-dimensional datasets by means of filter re-ranking[J]. Knowledge-Based Systems, 2012, 25(1):35-44. [59] BOLÓN-CANEDO V, SÁNCHEZ-MAROÑO N, ALONSO-BETANZOS A, et al. A review of microarray datasets and applied feature selection methods[J]. Information Sciences:an International Journal, 2014, 282(5):111-135. [60] 姚唐龙.基因表达谱数据挖掘的特征提取方法研究[D].合肥:安徽大学,2015:13-19.(YAO T L. Research on feature extraction method of gene expression profiles data mining[D]. Hefei:Anhui University, 2015:13-19.) |
[1] | Hong CHEN, Bing QI, Haibo JIN, Cong WU, Li’ang ZHANG. Class-imbalanced traffic abnormal detection based on 1D-CNN and BiGRU [J]. Journal of Computer Applications, 2024, 44(8): 2493-2499. |
[2] | Min SUN, Qian CHENG, Xining DING. CBAM-CGRU-SVM based malware detection method for Android [J]. Journal of Computer Applications, 2024, 44(5): 1539-1545. |
[3] | Lin GAO, Yu ZHOU, Tak Wu KWONG. Evolutionary bi-level adaptive local feature selection [J]. Journal of Computer Applications, 2024, 44(5): 1408-1414. |
[4] | Mingzhu LEI, Hao WANG, Rong JIA, Lin BAI, Xiaoying PAN. Oversampling algorithm based on synthesizing minority class samples using relationship between features [J]. Journal of Computer Applications, 2024, 44(5): 1428-1436. |
[5] | Dapeng XU, Xinmin HOU. Feature selection method for graph neural network based on network architecture design [J]. Journal of Computer Applications, 2024, 44(3): 663-670. |
[6] | Shengjie MENG, Wanjun YU, Ying CHEN. Feature selection algorithm for high-dimensional data with maximum correlation and maximum difference [J]. Journal of Computer Applications, 2024, 44(3): 767-771. |
[7] | Lin SUN, Menghan LIU. K-means clustering based on adaptive cuckoo optimization feature selection [J]. Journal of Computer Applications, 2024, 44(3): 831-841. |
[8] | Jingxin LIU, Wenjing HUANG, Liangsheng XU, Chong HUANG, Jiansheng WU. Unsupervised feature selection model with dictionary learning and sample correlation preservation [J]. Journal of Computer Applications, 2024, 44(12): 3766-3775. |
[9] | Enbao QIAO, Xiangyang GAO, Jun CHENG. Self-recovery adaptive Monte Carlo localization algorithm based on support vector machine [J]. Journal of Computer Applications, 2024, 44(10): 3246-3251. |
[10] | Tian HE, Zongxin SHEN, Qianqian HUANG, Yanyong HUANG. Adaptive learning-based multi-view unsupervised feature selection method [J]. Journal of Computer Applications, 2023, 43(9): 2657-2664. |
[11] | Xueyu HUANG, Huaiyu HE, Huimin LIN, Jinshui CHEN. Classification and recognition method of copper alloy metallograph based on feature aggregation [J]. Journal of Computer Applications, 2023, 43(8): 2593-2601. |
[12] | Lin SUN, Jinxu HUANG, Jiucheng XU. Feature selection for imbalanced data based on neighborhood tolerance mutual information and whale optimization algorithm [J]. Journal of Computer Applications, 2023, 43(6): 1842-1854. |
[13] | Zhenhua YU, Zhengqi LIU, Ying LIU, Cheng GUO. Feature selection method based on self-adaptive hybrid particle swarm optimization for software defect prediction [J]. Journal of Computer Applications, 2023, 43(4): 1206-1213. |
[14] | Lin SUN, Tianjiao MA, Zhan’ao XUE. Multilabel feature selection algorithm based on Fisher score and fuzzy neighborhood entropy [J]. Journal of Computer Applications, 2023, 43(12): 3779-3789. |
[15] | Jingcheng XU, Xuebin CHEN, Yanling DONG, Jia YANG. DDoS attack detection by random forest fused with feature selection [J]. Journal of Computer Applications, 2023, 43(11): 3497-3503. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||