Journal of Computer Applications ›› 2021, Vol. 41 ›› Issue (10): 2849-2857.DOI: 10.11772/j.issn.1001-9081.2020111893

Special Issue: 人工智能

• Artificial intelligence • Previous Articles     Next Articles

Multi-label feature selection based on label-specific feature with missing labels

ZHANG Zhihao1,2, LIN Yaojin1,2, LU Shun1,2, GUO Chen1,2, WANG Chenxi1,2   

  1. 1. School of Computer Science, Minnan Normal University, Zhangzhou Fujian 363000, China;
    2. Key Laboratory of Data Science and Intelligence Application of Fujian Provincial Universities(Minnan Normal University), Zhangzhou Fujian 363000, China
  • Received:2020-12-03 Revised:2021-03-01 Online:2021-10-10 Published:2021-10-27
  • Supported by:
    This work is partially supported by the Surface Program of National Natural Science Foundation of China (62076116), the Surface Program of Natural Science Foundation of Fujian Province (2020J01811).

缺失标记下基于类属属性的多标记特征选择

张志浩1,2, 林耀进1,2, 卢舜1,2, 郭晨1,2, 王晨曦1,2   

  1. 1. 闽南师范大学 计算机学院, 福建 漳州 363000;
    2. 数据科学与智能应用福建省高校重点实验室(闽南师范大学), 福建 漳州 363000
  • 通讯作者: 林耀进
  • 作者简介:张志浩(1996-),男,福建龙岩人,硕士研究生,主要研究方向:数据挖掘;林耀进(1980-),男,福建漳州人,教授,博士,主要研究方向:数据挖掘、机器学习;卢舜(1996-),男,福建三明人,硕士研究生,主要研究方向:数据挖掘;郭晨(1996-),男,福建莆田人,硕士研究生,主要研究方向:数据挖掘;王晨曦(1981-),女,辽宁凤城人,副教授,硕士,主要研究方向:数据挖掘、机器学习。
  • 基金资助:
    国家自然科学基金面上项目(62076116);福建省自然科学基金面上项目(2020J01811)。

Abstract: Multi-label feature selection has been widely used in many domains, such as image classification and disease diagnosis. However, there usually exist missing labels in the label space of data in practice, which destroys the structure and correlation between labels, so that the learning algorithms are difficult to exactly select important features. To address this problem, a Multi-label Feature Selection based on Label-specific feature with Missing Labels (MFSLML) algorithm was proposed. Firstly, the label-specific feature for each class label was obtained via sparse learning method. At the same time, the mapping relations between labels and label-specific features were constructed based on linear regression model, and were used to recover the missing labels. Finally, experiments were performed on 7 datasets with using 4 evaluation metrics. Experimental results show that compared to some state-of-the-art multi-label feature selection algorithms, such as multi-label feature selection algorithm based Max-Dependency and Min-Redundancy (MDMR) and the Multi-label Feature selection with Missing Labels via considering feature interaction (MFML), MFSLML can increase the average precision by 4.61-5.5 percentage points. It can be seen that MFSLML achieves better classification performance.

Key words: feature selection, label-specific feature, missing label, linear regression, multi-label learning

摘要: 多标记特征选择已在图像分类、疾病诊断等领域得到广泛应用;然而,现实中数据的标记空间往往存在部分标记缺失的问题,这破坏了标记间的结构性和关联性,使得学习算法难以准确地选择重要特征。针对此问题,提出一种缺失标记下基于类属属性的多标记特征选择(MFSLML)算法。首先,通过利用稀疏学习方法获取每个类标记的类属属性;同时基于线性回归模型构建类属属性与标记的映射关系,以用于恢复缺失标记;最后,选取7组数据集以及4个评价指标进行实验。实验结果表明:相比基于最大依赖度和最小冗余度的多标记特征选择算法(MDMR)和基于特征交互的多标记特征选择算法(MFML)等一些先进的多标记特征选择算法,MFSLML在平均查准率指标上能够提升4.61~5.5个百分点,由此可见MFSLML具有更优的分类性能。

关键词: 特征选择, 类属属性, 缺失标记, 线性回归, 多标记学习

CLC Number: