Journal of Computer Applications ›› 2018, Vol. 38 ›› Issue (10): 2807-2811.DOI: 10.11772/j.issn.1001-9081.2018040813

Previous Articles     Next Articles

Multi-label classification algorithm based on gravitational model

LI Zhaoyu, WANG Jichao, LEI Man, GONG Qin   

  1. College of Communication and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
  • Received:2018-04-20 Revised:2018-06-05 Online:2018-10-10 Published:2018-10-13
  • Supported by:
    This work is partially supported by the Program for Changjiang Scholars and Innovative Research Team in Universities (IRT_16R72).

基于引力模型的多标签分类算法

李兆玉, 王纪超, 雷曼, 龚琴   

  1. 重庆邮电大学 通信与信息工程学院, 重庆 400065
  • 通讯作者: 王纪超
  • 作者简介:李兆玉(1972-),女,重庆人,副教授,硕士,主要研究方向:无线通信系统;王纪超(1993-),男,河南郑州人,硕士研究生,主要研究方向:机器学习、数据挖掘;雷曼(1992-),女,重庆人,硕士研究生,主要研究方向:推荐系统、社交网络;龚琴(1993-),女,四川成都人,硕士研究生,主要研究方向:自然语言处理、深度学习、数据挖掘。
  • 基金资助:
    长江学者和创新团队发展计划项目(IRT_16R72)。

Abstract: Aiming at the problem that multi-label classification algorithms cannot fully utilize the correlation between labels, a new multi-label classification algorithm based on gravitational model namely MLBGM was proposed, by establishing the positive and negative correlation matrices of labels to mine different correlations among labeled. Firstly, by traversing all samples in the training set, k nearest neighbors for each training sample were obtain. Secondly, according to the distribution of labels in all neighbors of each sample, positive and negative correlation matrices were established for each training sample. Then, the neighbor density and neighbor weights for each training sample were calculated. Finally, a multi-label classification model was constructed by calculating the interaction between data particles. The experimental results show that the HammingLoss of MLBGM is reduced by an average of 15.62% compared with 5 contrast algorithms that do not consider negative correlation between labels; on the MicroF1, the average increase is 7.12%; on the SubsetAccuracy, the average increase is 14.88%. MLBGM obtains effective experimental results and outperforms comparison algorithms as it makes full use of the different correlations between labels.

Key words: multi-label classification, label correlation, gravitation model, neighbor density, neighbor weight

摘要: 针对多标签分类算法不能充分利用标签相关性的问题,通过建立标签的正、负相关性矩阵来挖掘标签间不同的相关关系,提出一种基于引力模型的多标签分类算法(MLBGM)。首先,遍历训练集中所有样本并分别求取每个训练样本的k个近邻样本,组成该样本的近邻集合;其次,根据每个样本的近邻集合中所有近邻样本的标签分布情况,分别为每个训练样本建立正、负相关矩阵来获取标签间的相关性;然后,为每个训练样本的近邻集合计算其近邻密度和近邻权重;最后,采用计算数据粒子间相互作用力的方式构建多标签分类模型。实验结果显示,MLBGM与5种未考虑标签负相关的对比算法相比,汉明损失(HammingLoss)平均降低了15.62%,微平均F1值(MicroF1)平均提升了7.12%,子集准确率(SubsetAccurary)平均提升了14.88%。MLBGM充分利用了标签间不同的相关性,获得了有效的实验结果且分类效果优于未考虑标签负相关的对比算法。

关键词: 多标签分类, 标签相关性, 引力模型, 近邻密度, 近邻权重

CLC Number: