《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (10): 3046-3053.DOI: 10.11772/j.issn.1001-9081.2021081486

• 人工智能 • 上一篇    

基于最大信息系数的ReliefF和支持向量机交互的自动特征选择算法

葛倩, 张光斌, 张小凤   

  1. 陕西师范大学 物理学与信息技术学院,西安 710119
  • 收稿日期:2021-08-19 修回日期:2021-11-20 接受日期:2021-11-21 发布日期:2022-01-07 出版日期:2022-10-10
  • 通讯作者: 张光斌
  • 作者简介:第一联系人:葛倩(1995—),女,山东滕州人,硕士研究生,主要研究方向:信号处理
    张光斌(1971—),男,陕西三原人,教授,博士,主要研究方向:信号与信息处理、图像分割; guangbinzhang@snnu.edu.cn
    张小凤(1971—),女,陕西三原人,教授,博士,主要研究方向:信号处理。

Automatic feature selection algorithm based on interaction of ReliefF with maximum information coefficient and SVM

Qian GE, Guangbin ZHANG, Xiaofeng ZHANG   

  1. School of Physics and Information Technology,Shaanxi Normal University,Xi’an Shaanxi 710119,China
  • Received:2021-08-19 Revised:2021-11-20 Accepted:2021-11-21 Online:2022-01-07 Published:2022-10-10
  • Contact: Guangbin ZHANG
  • About author:GE Qian, born in 1995, M. S. candidate. Her research interests include signal processing.
    ZHANG Guangbin, born in 1971, Ph. D. , professor. His research interests include signal and information processing, image segmentation.
    ZHANG Xiaofeng, born in 1971, Ph. D. , professor. Her research interests include signal processing.

摘要:

为解决特征选择ReliefF算法在利用欧氏距离选取近邻样本过程中,算法稳定性差以及选取的特征子集分类准确率低的问题,提出了一种利用最大信息系数(MIC)作为近邻样本选择标准的MICReliefF算法;同时,以支持向量机(SVM)模型的分类准确率作为评价指标,并多次寻优,以自动确定其最优特征子集,从而实现MICReliefF算法与分类模型的交互优化,即MICReliefF-SVM自动特征选择算法。在多个UCI公开数据集上对MICReliefF-SVM算法的性能进行了验证。实验结果表明,MICReliefF-SVM自动特征选择算法不仅可以筛除更多的冗余特征,而且可以选择出具有良好稳定性和泛化能力的特征子集。与随机森林(RF)、最大相关最小冗余(mRMR)、相关性特征选择(CFS)等经典的特征选择算法相比,MICReliefF-SVM算法具有更高的分类准确率。

关键词: 特征选择, 最大信息系数, ReliefF算法, 支持向量机, 极限学习机

Abstract:

In order to solve the problems of feature selection ReliefF algorithm, such as poor algorithm stability and low classification accuracy for selected feature subsets caused by using Euclidean distance to select the nearest neighbor samples, an MICReliefF (Maximum Information Coefficient-ReliefF) algorithm based on Maximum Information Coefficient (MIC) was proposed. At the same time, the classification accuracy of the Support Vector Machine (SVM) model was used as the evaluation index, and the optimal feature subset was automatically determined by multiple optimizations, thereby realizing the interactive optimization of the MICReliefF algorithm and the classification model, that is the MICReliefF-SVM automatic feature selection algorithm. The performance of the MICReliefF-SVM algorithm was verified on several UCI public datasets. Experimental results show that the MICReliefF-SVM automatic feature selection algorithm cannot only filter out more redundant features, but also select the feature subsets with good stability and generalization ability. Compared with Random Forest (RF), max-Relevance and Min-Redundancy (mRMR), Correlation-based Feature Selection (CFS) and other classical feature selection algorithms, MICReliefF algorithm has higher classification accuracy.

Key words: feature selection, Maximum Information Coefficient (MIC), ReliefF algorithm, Support Vector Machine (SVM), Extreme Learning Machine (ELM)

中图分类号: