Journal of Computer Applications ›› 2026, Vol. 46 ›› Issue (5): 1482-1489.DOI: 10.11772/j.issn.1001-9081.2025050567

• Data science and technology • Previous Articles    

Distributed multi-label feature selection method with feature-label neighborhood collaborative correlation

Xipei TAO1, Hengrong JU1,2(), Xiaoxue FAN1, Xiaoyang ZOU1, Weiping DING1   

  1. 1.School of Artificial Intelligence and Computer Science,Nantong University,Nantong Jiangsu 226019,China
    2.State Key Laboratory for Novel Software Technology (Nanjing University),Nanjing Jiangsu 210023,China
  • Received:2025-05-22 Revised:2025-06-19 Accepted:2025-06-26 Online:2025-07-08 Published:2026-05-10
  • Contact: Hengrong JU
  • About author:TAO Xipei, born in 2001, M. S. candidate. His research interests include granular computing, rough set.
    FAN Xiaoxue, born in 2000, M. S. candidate. Her research interests include granular computing, knowledge discovery.
    ZOU Xiaoyang, born in 2001, M. S. candidate. His research interests include community detection, social network analysis.
    DING Weiping, born in 1979, Ph. D., professor. His research interests include data mining, machine learning, granular computing, rough set.
  • Supported by:
    National Natural Science Foundation of China(62006128);State Key Laboratory for Novel Computer Software Technology at Nanjing University(KFKT2024B30);Nantong Natural Science Foundation(JC2024044)

特征-标记邻域协同相关的分布式多标记特征选择方法

陶西沛1, 鞠恒荣1,2(), 樊晓雪1, 邹晓阳1, 丁卫平1   

  1. 1.南通大学 人工智能与计算机学院,江苏 南通 226019
    2.计算机软件新技术国家重点实验室(南京大学),南京 210023
  • 通讯作者: 鞠恒荣
  • 作者简介:陶西沛(2001—),男,江苏连云港人,硕士研究生,主要研究方向:粒计算、粗糙集
    樊晓雪(2000—),女,江苏南通人,硕士研究生,主要研究方向:粒计算、知识发现
    邹晓阳(2001—),男,江苏苏州人,硕士研究生,主要研究方向:社区发现、社交网络分析
    丁卫平(1979—),男,江苏常州人,教授,博士,CCF会员,主要研究方向:数据挖掘、机器学习、粒计算、粗糙集。
  • 基金资助:
    国家自然科学基金资助项目(62006128);南京大学计算机软件新技术国家重点实验室资助项目(KFKT2024B30);南通市自然科学基金资助项目(JC2024044)

Abstract:

Traditional multi-label neighborhood rough sets treat all labels as a whole when calculating feature importance, failing to effectively distinguish the differences in contribution to feature selection among different labels and ignoring the noise interference caused by irrelevant labels. To address these issues, a Distributed Multi-Label feature selection method with Feature-label Neighborhood Collaborative Correlation (DML-FNCC) was proposed. Firstly, bidirectional spectral clustering was utilized to simultaneously mine the internal associations between labels and feature spaces: decision-representative primary label clusters were extracted in the label space to reduce noise interference, while a spectral clustering map based on semantic relevance was constructed in the feature space to achieve modular aggregation of semantically correlated features. Secondly, neighborhood dependency was employed to quantify the association degree between feature clusters and label clusters, selecting the feature subsets most closely related to each label cluster. Finally, a distributed framework was adopted to distribute computational tasks across multiple nodes, further accelerating the model training process. Experimental results on 12 public datasets demonstrate that DML-FNCC outperforms existing multi-label feature selection approaches, such as PMLFS (Partial Multi-Label Feature Selection) and WFDP (Weak-label Fuzzy Discernibility Pairs). It achieves the top ranking in terms of average precision, Hamming loss, one error, ranking loss, and coverage, leading to improved classification performance.

Key words: spectral clustering, neighborhood rough set, distributed learning, feature selection, data mining

摘要:

针对传统多标记邻域粗糙集将所有标记视为整体计算特征重要性,不能有效区分不同标记对特征选择的贡献差异,并且忽视了无关标记的噪声干扰问题,提出一种特征-标记邻域协同相关的分布式多标记特征选择方法(DML-FNCC)。首先,通过双向谱聚类同步挖掘标记与特征空间的内部关联,在标记空间提取具有决策代表性的标记主簇以降低噪声干扰,同时在特征空间构建基于语义相关性的谱聚类映射,实现高相关特征的模块化聚合;其次,考虑邻域依赖度量化特征簇与标记簇关联程度,筛选出与各标记簇最相关的特征子集;最后,采用分布式框架将计算任务分散到多个节点,进一步加速模型训练过程。在12个公开数据集上的实验结果表明,与现有多标记特征选择方法PMLFS(Partial Multi-Label Feature Selection)、WFDP(Weak-label Fuzzy Discernibility Pairs)等相比,DML-FNCC在平均精度、汉明损失、单错误率和排序损失以及覆盖度指标上均排名第一,分类性能得到有效提升。

关键词: 谱聚类, 邻域粗糙集, 分布式学习, 特征选择, 数据挖掘

CLC Number: