Journal of Computer Applications ›› 2019, Vol. 39 ›› Issue (10): 2815-2821.DOI: 10.11772/j.issn.1001-9081.2019030483

• Artificial intelligence • Previous Articles     Next Articles

Fast feature selection method based on mutual information in multi-label learning

XU Hongfeng1,2, SUN Zhenqiang2   

  1. 1. School of Economics and Management, Guizhou Normal University, Guiyang Guizhou 550001, China;
    2. School of Informatics, Xiamen University, Xiamen Fujian 361005, China
  • Received:2019-03-25 Revised:2019-05-20 Online:2019-06-03 Published:2019-10-10
  • Supported by:
    This work is partially supported by the Department of Science and Technology of Guizhou Province ([2011]2215).

多标签学习中基于互信息的快速特征选择方法

徐洪峰1,2, 孙振强2   

  1. 1. 贵州师范大学 经济与管理学院, 贵阳 550001;
    2. 厦门大学 信息学院, 厦门 361005
  • 通讯作者: 徐洪峰
  • 作者简介:徐洪峰(1977-),男,江西上饶人,副教授,博士研究生,CCF会员,主要研究方向:机器学习、深度学习、计算机网络、企业信息化;孙振强(1993-),男,吉林吉林人,硕士研究生,主要研究方向:机器学习、数据挖掘。
  • 基金资助:
    贵州省科学技术厅基金资助项目(黔科合J字[2011]2215号)。

Abstract: Concerning the high time complexity of traditional heuristic search-based multi-label feature selection algorithm, an Easy and Fast Multi-Label Feature Selection (EF-MLFS) method was proposed. Firstly, Mutual Information (MI) was used to measure the features and the correlations between the labels of each dimension; then, the obtained correlations were added up and ranked; finally, feature selection was performed according to the total correlation. The proposed method was compared to six existing representative multi-label feature selection methods such as Max-Dependency and Min-Redundancy (MDMR) algorithm, Multi-Label Naive Bayes (MLNB) method. Experimental results show that the average precision, coverage, Hamming Loss and other common multi-label classification indicators are optimal after feature selection and classificationby using EF-MLFS method. In addition, global search is not required in the method, so the time complexity is significantly reduced compared with MDMR and Pairwise Mutli-label Utility (PMU).

Key words: multi-label learning, feature selection, mutual information, label correlation

摘要: 针对传统的基于启发式搜索的多标记特征选择算法时间复杂度高的问题,提出一种简单快速的多标记特征选择(EF-MLFS)方法。首先使用互信息(MI)衡量每个维度的特征与每一维标记之间的相关性,然后将所得相关性相加并排序,最后按照总的相关性大小进行特征选择。将所提方法与六种现有的比较有代表性的多标记特征选择方法作对比,如最大依赖性最小冗余性(MDMR)算法和基于朴素贝叶斯的多标记特征选择(MLNB)方法等。实验结果表明,EF-MLFS方法进行特征选择并分类的结果在平均准确率、覆盖率、海明损失等常见的多标记分类评价指标上均达最优;该方法无需进行全局搜索,因此时间复杂度相较于MDMR、对偶多标记应用(PMU)等方法也有明显降低。

关键词: 多标签学习, 特征选择, 互信息, 标记相关性

CLC Number: