计算机应用 ›› 2018, Vol. 38 ›› Issue (5): 1245-1249.DOI: 10.11772/j.issn.1001-9081.2017112730

• 人工智能 • 上一篇    下一篇

融合Shapley值和粒子群优化算法的混合特征选择算法

邓秀勤1, 李文洲1, 武继刚2, 刘太亨1   

  1. 1. 广东工业大学 应用数学学院, 广州 510006;
    2. 广东工业大学 计算机学院, 广州 510006
  • 收稿日期:2017-11-18 修回日期:2017-12-20 出版日期:2018-05-10 发布日期:2018-05-24
  • 通讯作者: 邓秀勤
  • 作者简介:邓秀勤(1966-),女(瑶族),广东连州人,教授,硕士,主要研究方向:数据挖掘、智能计算;李文洲(1994-),男,广东揭阳人,硕士研究生,主要研究方向:数据挖掘;武继刚(1963-),男,江苏沛县人,教授,博士生导师,博士,CCF会员,主要研究方向:移动云计算、机器智能;刘太亨(1993-),男,广东肇庆人,硕士研究生,主要研究方向:数据挖掘。
  • 基金资助:
    国家自然科学基金资助项目(61672171);广东工业大学研究生创新及竞赛项目(2017YJSCX039)。

Hybrid feature selection algorithm fused Shapley value and particle swarm optimization

DENG Xiuqin1, LI Wenzhou1, WU Jigang2, LIU Taiheng1   

  1. 1. School of Applied Mathematics, Guangdong University of Technology, Guangzhou Guangdong 510006, China;
    2. School of Computers, Guangdong University of Technology, Guangzhou Guangdong 510006, China
  • Received:2017-11-18 Revised:2017-12-20 Online:2018-05-10 Published:2018-05-24
  • Contact: 邓秀勤
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61672171), the Graduate Students' Innovation and Competition Program of Guangdong University of Technology (2017YJSCX039).

摘要: 针对在模式分类问题中,数据往往存在不相关的或冗余的特征,从而影响分类的准确性的问题,提出一种融合Shapley值和粒子群优化算法的混合特征选择算法,以利用最少的特征获得最佳分类效果。在粒子群优化算法的局部搜索中引入博弈论的Shapley值,首先计算粒子(特征子集)中每个特征对分类效果的贡献值(Shapley值),然后逐步删除Shapley值最低的特征以优化特征子集,进而更新粒子,同时也增强了算法的全局搜索能力,最后将改进后的粒子群优化算法运用于特征选择,以支持向量机分类器的分类性能和选择的特征数目作为特征子集评价标准,对UCI机器学习数据集和基因表达数据集的17个具有不同特征数量的医疗数据集进行分类实验。实验结果表明所提算法能有效地删除数据集中55%以上不相关的或冗余的特征,尤其对于中大型数据集能删减80%以上,并且所选择的特征子集也具有较好的分类能力,分类准确率能提高2至23个百分点。

关键词: 模式分类, 粒子群优化算法, Shapley值, 特征选择, 支持向量机

Abstract: Concerning the problem that data often has irrelevant or redundant features which affect the classification accuracy in pattern classification problems, a hybrid feature selection method based on Shapley value and Particle Swarm Optimization (PSO) was proposed to obtain the best classification results with the fewest features. Firstly, the Shapley value of game theory was introduced into the local search of PSO algorithm. Then,by calculating the Shapley value of each feature in the particle (feature subset), the feature with the lowest Shapley value was gradually deleted to optimize the feature subset and update the particle, and enhance the global search ability of the algorithm at the same time. Finally, the improved particle swarm algorithm was applied to feature selection. The classification performance and the number of selected features of the support vector machine classifier were used as feature subset evaluation criteria. The classification experiments were performed on 17 medical data sets with different characteristic quantities of UCI machine learning data sets and gene expression data sets. The experimental results show that the proposed algorithm can remove more than 55% irrelevant or redundant features in the datasets effectively, especially more than 80% in the medium and large datasets, and the selected feature subset also has better classification ability,the classification accuracy can be increased by 2 to 23 percentage points.

Key words: pattern classification, Particle Swarm Optimization (PSO) algorithm, Shapley value, feature selection, Support Vector Machine (SVM)

中图分类号: