计算机应用 ›› 2011, Vol. 31 ›› Issue (12): 3292-3296.

• 人工智能 • 上一篇    下一篇

基于D-score与支持向量机的混合特征选择方法

谢娟英1,2,雷金虎1,谢维信2,3,高新波2   

  1. 1. 陕西师范大学 计算机科学学院,西安 710062
    2. 西安电子科技大学 电子工程学院,西安 710071
    3. 深圳大学 信息工程学院,广东 深圳 518060
  • 收稿日期:2011-06-20 修回日期:2011-08-08 发布日期:2011-12-12 出版日期:2011-12-01
  • 通讯作者: 谢娟英
  • 基金资助:
    中央高校基本科研业务费专项资金重点项目;陕西省自然科学基础研究计划项目;中央高校基本科研业务费专项资金资助项目

Hybrid feature selection methods based on D-score and support vector machine

XIE Juan-ying1,2,LEI Jin-hu1,XIE Wei-xin2,3,GAO Xin-bo2   

  1. 1. School of Computer Science, Shaanxi Normal University, Xi’an Shaanxi 710062, China
    2. School of Electronic Engineering, Xidian University, Xi’an Shaanxi 710071, China
    3. School of Information Engineering, Shenzhen University, Shenzhen Guangdong 518060, China
  • Received:2011-06-20 Revised:2011-08-08 Online:2011-12-12 Published:2011-12-01
  • Contact: XIE Juan-ying

摘要: F-score作为特征评价准则时,没有考虑不同特征的不同测量量纲对特征重要性的影响。为此,提出一种新的特征评价准则D-score,该准则不仅可以衡量样本特征在两类或多类之间的辨别能力,而且不受特征测量量纲对特征重要性的影响。以D-score为特征重要性评价准则,结合前向顺序搜索、前向顺序浮动搜索以及后向浮动搜索三种特征搜索策略,以支持向量机分类正确率评价特征子集的分类性能得到三种混合的特征选择方法。这些特征选择方法结合了Filter方法和Wrapper方法的各自优势实现特征选择。对UCI机器学习数据库中9个标准数据集的实验测试,以及与基于改进F-score与支持向量机的混合特征选择方法的实验比较,表明D-score特征评价准则是一种有效的样本特征重要性,也即特征辨别能力衡量准则。基于该准则与支持向量机的混合特征选择方法实现了有效的特征选择,在保持数据集辨识能力不变情况下实现了维数压缩。

关键词: D-score, F-score, 支持向量机, 特征选择, 评估准则, 维压缩

Abstract: As a criterion of feature selection, F-score does not consider the influence of the different measuring dimensions on the importance of different features. To evaluate the discrimination of features between classes, a new criterion called D-score was presented. This D-score criterion not only has the property as the improved F-score in measuring the discrimination between more than two sets of real numbers, but also is not influenced by different measurement units for features when measuring their discriminability. D-score was used as a criterion to measure the importance of a feature, and Sequential Forward Search (SFS) strategy, Sequential Forward Floating Search (SFFS) strategy, and Sequential Backward Floating Search (SBFS) strategy were, respectively, adopted to select features, while Support Vector Machine (SVM) was used as the classification tool, so that three new hybrid feature selection methods were proposed. The three new hybrid feature selection methods combined the advantages of Filter methods and Wrapper methods where SVM played the role to evaluate the classification capacity of the selected subset of features via the classification accuracy, and leaded the feature selection procedure. These three new hybrid feature selection methods were tested on nine datasets from UCI machine learning repository and compared with the corresponding algorithms with F-score as criterion of the discriminability of features. The experimental results show that D-score outperforms F-score in evaluating the discrimination of features, and can be used to implement the dimension reduction without compromising the classification capacity of datasets.

Key words: D-score, F-score, support vector machines (SVM), feature selection, criterion, dimension reduction