基于D-score与支持向量机的混合特征选择方法

计算机应用 ›› 2011, Vol. 31 ›› Issue (12): 3292-3296.

基于D-score与支持向量机的混合特征选择方法

谢娟英¹,²,雷金虎¹,谢维信²,³,高新波²

1. 陕西师范大学计算机科学学院，西安 710062
2. 西安电子科技大学电子工程学院，西安 710071
3. 深圳大学信息工程学院，广东深圳 518060

收稿日期:2011-06-20 修回日期:2011-08-08 发布日期:2011-12-12 出版日期:2011-12-01
通讯作者: 谢娟英
基金资助:
中央高校基本科研业务费专项资金重点项目;陕西省自然科学基础研究计划项目;中央高校基本科研业务费专项资金资助项目

Hybrid feature selection methods based on D-score and support vector machine

XIE Juan-ying¹,²,LEI Jin-hu¹,XIE Wei-xin²,³,GAO Xin-bo²

1. School of Computer Science, Shaanxi Normal University, Xi’an Shaanxi 710062, China
2. School of Electronic Engineering, Xidian University, Xi’an Shaanxi 710071, China
3. School of Information Engineering, Shenzhen University, Shenzhen Guangdong 518060, China

Received:2011-06-20 Revised:2011-08-08 Online:2011-12-12 Published:2011-12-01
Contact: XIE Juan-ying

摘要/Abstract

摘要： F-score作为特征评价准则时，没有考虑不同特征的不同测量量纲对特征重要性的影响。为此，提出一种新的特征评价准则D-score，该准则不仅可以衡量样本特征在两类或多类之间的辨别能力，而且不受特征测量量纲对特征重要性的影响。以D-score为特征重要性评价准则，结合前向顺序搜索、前向顺序浮动搜索以及后向浮动搜索三种特征搜索策略，以支持向量机分类正确率评价特征子集的分类性能得到三种混合的特征选择方法。这些特征选择方法结合了Filter方法和Wrapper方法的各自优势实现特征选择。对UCI机器学习数据库中9个标准数据集的实验测试，以及与基于改进F-score与支持向量机的混合特征选择方法的实验比较，表明D-score特征评价准则是一种有效的样本特征重要性，也即特征辨别能力衡量准则。基于该准则与支持向量机的混合特征选择方法实现了有效的特征选择，在保持数据集辨识能力不变情况下实现了维数压缩。

关键词: D-score, F-score, 支持向量机, 特征选择, 评估准则, 维压缩

Abstract: As a criterion of feature selection, F-score does not consider the influence of the different measuring dimensions on the importance of different features. To evaluate the discrimination of features between classes, a new criterion called D-score was presented. This D-score criterion not only has the property as the improved F-score in measuring the discrimination between more than two sets of real numbers, but also is not influenced by different measurement units for features when measuring their discriminability. D-score was used as a criterion to measure the importance of a feature, and Sequential Forward Search (SFS) strategy, Sequential Forward Floating Search (SFFS) strategy, and Sequential Backward Floating Search (SBFS) strategy were, respectively, adopted to select features, while Support Vector Machine (SVM) was used as the classification tool, so that three new hybrid feature selection methods were proposed. The three new hybrid feature selection methods combined the advantages of Filter methods and Wrapper methods where SVM played the role to evaluate the classification capacity of the selected subset of features via the classification accuracy, and leaded the feature selection procedure. These three new hybrid feature selection methods were tested on nine datasets from UCI machine learning repository and compared with the corresponding algorithms with F-score as criterion of the discriminability of features. The experimental results show that D-score outperforms F-score in evaluating the discrimination of features, and can be used to implement the dimension reduction without compromising the classification capacity of datasets.

Key words: D-score, F-score, support vector machines (SVM), feature selection, criterion, dimension reduction

谢娟英雷金虎谢维信高新波. 基于D-score与支持向量机的混合特征选择方法[J]. 计算机应用, 2011, 31(12): 3292-3296.

XIE Juan-ying LEI Jin-hu XIE Wei-xin GAO Xin-bo. Hybrid feature selection methods based on D-score and support vector machine[J]. Journal of Computer Applications, 2011, 31(12): 3292-3296.

[1]	湛航, 何朗, 黄樟灿, 李华峰, 张蔷, 谈庆. 改进的基于层次距离的基因表达式编程特征选择分类算法[J]. 计算机应用, 2021, 41(9): 2658-2667.
[2]	祝承, 赵晓琦, 赵丽萍, 焦玉宏, 朱亚飞, 陈建英, 周伟, 谭颖. 基于谱聚类半监督特征选择的功能磁共振成像数据分类[J]. 计算机应用, 2021, 41(8): 2288-2293.
[3]	李蒙蒙, 秦伟, 刘艺, 刁兴春. 结合头脑风暴优化的混合蚁群优化算法[J]. 计算机应用, 2021, 41(8): 2412-2417.
[4]	林筠超, 万源. 基于图结构优化的自适应多度量非监督特征选择方法[J]. 计算机应用, 2021, 41(5): 1282-1289.
[5]	贾鹤鸣, 姜子超, 李瑶, 孙康健. 基于改进斑点鬣狗优化算法的同步优化特征选择[J]. 计算机应用, 2021, 41(5): 1290-1298.
[6]	袁芊芊, 邓洪敏, 王晓航. 基于超像素快速模糊C均值聚类与支持向量机的柑橘病虫害区域分割[J]. 计算机应用, 2021, 41(2): 563-570.
[7]	李凯, 李洁. 基于pinball损失的结构模糊多分类支持向量机算法[J]. 《计算机应用》唯一官方网站, 2021, 41(11): 3104-3112.
[8]	陆荣秀, 陈明明, 杨辉, 朱建勇. 基于溶液图像时序特征的元素组分含量动态监测系统[J]. 计算机应用, 2021, 41(10): 3075-3081.
[9]	张志浩, 林耀进, 卢舜, 郭晨, 王晨曦. 缺失标记下基于类属属性的多标记特征选择[J]. 计算机应用, 2021, 41(10): 2849-2857.
[10]	童林, 官铮. 改进鲸鱼优化支持向量机的交通流量模糊粒化预测[J]. 计算机应用, 2021, 41(10): 2919-2927.
[11]	黄学雨, 徐浩特, 陶剑文. 具有特征选择的多源自适应分类框架[J]. 计算机应用, 2020, 40(9): 2499-2506.
[12]	顾桐, 许国良, 李万林, 李家浩, 王志愿, 雒江涛. 基于集成LightGBM和贝叶斯优化策略的房价智能评估模型[J]. 计算机应用, 2020, 40(9): 2762-2767.
[13]	刘丹, 姚立霜, 王云锋, 裴作飞. 面向类不平衡流量数据的分类模型[J]. 计算机应用, 2020, 40(8): 2327-2333.
[14]	肖跃雷, 张云娇. 基于特征选择和超参数优化的恐怖袭击组织预测方法[J]. 计算机应用, 2020, 40(8): 2262-2267.
[15]	汪志远, 降爱莲, 奥斯曼·穆罕默德. 基于正则互表示的无监督特征选择方法[J]. 计算机应用, 2020, 40(7): 1896-1900.