计算机应用 ›› 2018, Vol. 38 ›› Issue (11): 3167-3174.DOI: 10.11772/j.issn.1001-9081.2018041354

• 第七届中国数据挖掘会议(CCDM 2018) • 上一篇    下一篇

基于拉普拉斯评分的多标记特征选择算法

胡敏杰, 林耀进, 王晨曦, 唐莉, 郑荔平   

  1. 闽南师范大学 计算机学院, 福建 漳州 363000
  • 收稿日期:2018-03-20 修回日期:2018-06-26 出版日期:2018-11-10 发布日期:2018-11-10
  • 通讯作者: 胡敏杰
  • 作者简介:胡敏杰(1979-),女,湖北武汉人,讲师,硕士,主要研究方向:特征选择;林耀进(1980-),男,福建漳浦人,副教授,博士,主要研究方向:数据挖掘、粒计算;王晨曦(1981-),女,福建漳浦人,讲师,硕士,主要研究方向:数据挖掘;唐莉(1993-),女,福建漳州人,硕士研究生,主要研究方向:数据挖掘;郑荔平(1977-),女,福建莆田人,讲师,硕士,主要研究方向:数据挖掘。
  • 基金资助:
    国家自然科学基金资助项目(61672272);福建省教育厅科技项目(JAT170347,JAT170350)。

Multi-label feature selection algorithm based on Laplacian score

HU Minjie, LIN Yaojin, WANG Chenxi, TANG Li, ZHENG Liping   

  1. School of Computer Science, Minnan Normal University, Zhangzhou Fujian 363000, China
  • Received:2018-03-20 Revised:2018-06-26 Online:2018-11-10 Published:2018-11-10
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61672272), the Science and Technology Project of Fujian Education Department (JAT170347,JAT170350).

摘要: 针对传统的拉普拉斯评分特征选择算法只适应单标记学习,无法直接应用于多标记学习的问题,提出一种应用于多标记任务的拉普拉斯评分特征选择算法。首先,考虑样本在整体标记空间中共同关联和共同不关联的相关性,重新构建样本相似度矩阵;然后,将特征之间的相关性及冗余性判定引入拉普拉斯评分算法中,采用前向贪心搜索策略依次评价候选特征与已选特征的联合作用能力,用于评价特征的重要性;最后,在5个不同评价指标和6个多标记数据集上实验。实验结果表明:相比基于最大依赖的多标记维数约简方法(MDDM)、基于贝叶斯分类器的多标记特征选择算法(MLNB)及基于多元互信息的多标记分类特征选择算法(PMU),所提算法不仅分类性能最优,且存在显著性优异达65%。

关键词: 特征选择, 拉普拉斯, 多标记分类, 搜索策略, 特征关联

Abstract: Aiming at the problem that the traditional Laplacian score for feature selection cannot be directly applied to multi-label tasks, a multi-label feature selection algorithm based on Laplacian score was proposed. Firstly, the sample similarity matrix was reconstructed by the correlation of the common and non-correlated correlations of the samples in the overall label space. Then, the correlation and redundancy between features were introduced into Laplacian score, and a forward greedy search strategy was designed to evaluate the co-operation ability between candidate features and selected features, which was used to evaluate the importance of candidate features. Finally, extensive experiments were conducted on six multi-label data sets with five different evaluation criteria. The experimental results show that compared with Multi-label Dimensionality reduction via Dependence Maximization (MDDM), Feature selection for Multi-Label Naive Bayes classification (MLNB) and feature selection for multi-label classification using multivariate mutual information (PMU), the proposed algorithm not only has the best classification performance, but also has a remarkable performance of up to 65%.

Key words: feature selection, Laplacian score, multi-label classification, search strategy, feature relevance

中图分类号: