Journal of Computer Applications ›› 2020, Vol. 40 ›› Issue (2): 503-509.DOI: 10.11772/j.issn.1001-9081.2019091626

• CCF Bigdata 2019 • Previous Articles     Next Articles

Multi-label feature selection algorithm based on conditional mutual information of expert feature

Yusheng CHENG1,2(), Fan SONG1, Yibin WANG1,2, Kun QIAN1   

  1. 1.School of Computer and Information,Anqing Normal University,Anqing Anhui 246011,China
    2.University Key Laboratory of Intelligent Perception and Computing of Anhui Province,Anqing Anhui 246011,China
  • Received:2019-08-30 Revised:2019-09-24 Accepted:2019-10-09 Online:2019-10-31 Published:2020-02-10
  • Contact: Yusheng CHENG
  • About author:SONG Fan, born in 1992, M. S. candidate. His research interests include multi-label learning, neural network.
    WNAG Yibin, born in 1970, M. S., professor. His research interests include multi-label learning, machine learning, software security.
    QIAN Kun, born in 1995, M. S. candidate. His research interests include multi-label learning, machine learning, data statistics.
  • Supported by:
    the Key University Natural Science Research Project of Anhui Province(KJ2017A352);the Program for Innovative Research Team in Anqing Normal University

基于专家特征的条件互信息多标记特征选择算法

程玉胜1,2(), 宋帆1, 王一宾1,2, 钱坤1   

  1. 1.安庆师范大学 计算机与信息学院,安徽 安庆 246011
    2.安徽省高校智能感知与计算重点实验室,安徽 安庆 246011
  • 通讯作者: 程玉胜
  • 作者简介:宋帆(1992—),男,安徽铜陵人,硕士研究生,CCF会员,主要研究方向:多标记学习、神经网络
    王一宾(1970—),男,安徽安庆人,教授,硕士,主要研究方向:多标记学习、机器学习、软件安全
    钱坤(1995—),男,安徽滁州人,硕士研究生,CCF会员,主要研究方向:多标记学习、机器学习、数据统计。
  • 基金资助:
    安徽省高校重点科研项目(KJ2017A352);安庆师范大学科研创新团队建设计划项目

Abstract:

Feature selection plays an important role in the classification accuracy and generalization performance of classifiers. The existing multi-label feature selection algorithms mainly use the maximum relevance and minimum redundancy criterion to perform feature selection in all feature sets without considering expert features, therefore, the multi-label feature selection algorithm has the disadvantages of long running time and high complexity. Actually, in real life, experts can directly determine the overall prediction direction based on a few or several key features. Paying attention to and extracting this information will inevitably reduce the calculation time of feature selection and even improve the performance of classifier. Based on this, a multi-label feature selection algorithm based on conditional mutual information of expert feature was proposed. Firstly, the expert features were combined with the remaining features, and then the conditional mutual information was used to obtain a feature sequence of strong to weak relativity with the label set. Finally, the subspaces were divided to remove the redundant features. The experimental comparison was performed to the proposed algorithm on 7 multi-label datasets. Experimental results show that the proposed algorithm has certain advantages over the other feature selection algorithms, and the statistical hypothesis testing and the stability analysis further illustrate the effectiveness and the rationality of the proposed algorithm.

Key words: feature selection, expert feature, conditional mutual information, multi-label learning, local subspace

摘要:

特征选择对于分类器的分类精度和泛化性能起重要作用。目前的多标记特征选择算法主要利用最大相关性最小冗余性准则在全部特征集中进行特征选择,没有考虑专家特征,因此多标记特征选择算法的运行时间较长、复杂度较高。实际上,在现实生活中专家依据几个或者多个关键特征就能够直接决定整体的预测方向。如果提取关注这些信息,必将减少特征选择的计算时间,甚至提升分类器性能。基于此,提出一种基于专家特征的条件互信息多标记特征选择算法。首先将专家特征与剩余的特征相联合,再利用条件互信息得出一个与标记集合相关性由强到弱的特征序列,最后通过划分子空间去除冗余性较大的特征。该算法在7个多标记数据集上进行了实验对比,结果表明该算法较其他特征选择算法有一定优势,统计假设检验与稳定性分析进一步证明了所提出算法的有效性和合理性。

关键词: 特征选择, 专家特征, 条件互信息, 多标记学习, 局部子空间

CLC Number: