计算机应用 ›› 2017, Vol. 37 ›› Issue (9): 2433-2438.DOI: 10.11772/j.issn.1001-9081.2017.09.2433

• CRSSC-CWI-CGrC-3WD 2017 •    下一篇

高维小样本分类问题中特征选择研究综述

王翔1,2, 胡学钢1   

  1. 1. 合肥工业大学 计算机信息学院, 合肥 230009;
    2. 安徽省科学技术情报研究所 文献情报分析中心, 合肥 230011
  • 收稿日期:2017-03-27 修回日期:2017-04-21 出版日期:2017-09-10 发布日期:2017-09-13
  • 通讯作者: 王翔,wangxiang@ahinfo.gov.cn
  • 作者简介:王翔(1982-),男,安徽合肥人,博士研究生,主要研究方向:数据挖掘、人工智能、情报分析;胡学钢(1962-),男,安徽合肥人,教授,博士,主要研究方向:数据挖掘、人工智能、大数据分析。
  • 基金资助:

    国家973计划项目(2016YFC0801406);国家自然科学基金资助项目(61673152);安徽省自然科学基金资助项目(1408085QF136)。

Overview on feature selection in high-dimensional and small-sample-size classification

WANG Xiang1,2, HU Xuegang1   

  1. 1. School of Computer and Information, Hefei University of Technology, Hefei Anhui 230009, China;
    2. Literature Information Analysis Department, Anhui Institute of Scientific and Technical Information, Hefei Anhui 230011, China
  • Received:2017-03-27 Revised:2017-04-21 Online:2017-09-10 Published:2017-09-13
  • Supported by:

    This work is partially supported by the National Basic Research Program (973 Program) of China (2016YFC0801406), the National Natural Science Foundation of China (61673152), the Natural Science Foundation of Anhui Province (1408085QF136).

摘要:

随着生物信息学、基因表达谱微阵列、图像识别等技术的发展,高维小样本分类问题成为数据挖掘(包括机器学习、模式识别)中的一项挑战性任务,容易引发"维数灾难"和过拟合问题。针对这个问题,特征选择可以有效避免维数灾难,提升分类模型泛化能力,成为研究的热点,有必要对国内外高维小样本特征选择主要研究情况进行综述。首先分析了高维小样本特征选择问题的本质;其次,根据其算法的本质区别,重点对高维小样本数据的特征选择方法进行分类剖析和比较;最后对高维小样本特征选择研究面临的挑战以及研究方向作了展望。

关键词: 特征选择, 高维数据, 小样本学习, 信息过滤, 支持向量机

Abstract:

With the development of bioinformatics, gene expression microarray and image recognition, classification on high-dimensional and small-sample-size data has become a challenging task in data ming, machine learning and pattern recognition as well. High-dimensional and small-sample-size data may cause the problem of "curse of dimensionality" and overfitting. Feature selection can prevent the "curse of dimensionality" effectively and promote the generalization ability of classification mode, and thus become a hot research topic. Accordingly, some recent development of world-wide research on feature selection in high-dimensional and small-sample-size classification was briefly reviewed. Firstly, the nature of high-dimensional and small-sample feature selection was analyzed. Secondly, according to their essential difference, feature selection algorithms for high-dimensional and small-sample-size classification were divided into four categories and compared to summarize their advantages and disadvantages. Finally, challenges and prospects for future trends of feature selection in high-dimensional small-sample-size data were proposed.

Key words: feature selection, high-dimensional data, small-sample-size learning, information filtering, Support Vector Machine (SVM)

中图分类号: