计算机应用 ›› 2015, Vol. 35 ›› Issue (10): 2761-2765.DOI: 10.11772/j.issn.1001-9081.2015.10.2761

• 第十五届中国机器学习会议(CCML2015)论文 • 上一篇    下一篇

基于标签相关性的K近邻多标签分类方法

檀何凤, 刘政怡   

  1. 安徽大学 计算机科学与技术学院, 合肥 230601
  • 收稿日期:2015-06-01 修回日期:2015-06-24 出版日期:2015-10-10 发布日期:2015-10-14
  • 通讯作者: 檀何凤(1990-),女,安徽安庆人,硕士研究生,主要研究方向:机器学习、人工智能,1060895242@qq.com
  • 作者简介:刘政怡(1978-),女,安徽芜湖人,副教授,博士,主要研究方向:人工智能。
  • 基金资助:
    安徽省科技攻关计划项目(1301b042020);高等学校博士学科点专项科研基金资助项目(20133401110009);安徽大学研究生学术创新项目(Ygh100166)。

Multi-label K nearest neighbor algorithm by exploiting label correlation

TAN Hefeng, LIU Zhengyi   

  1. College of Computer Science and Technology, Anhui University, Hefei Anhui 230601, China
  • Received:2015-06-01 Revised:2015-06-24 Online:2015-10-10 Published:2015-10-14

摘要: 针对K近邻多标签(ML-KNN)分类算法中未考虑标签相关性的问题,提出了一种基于标签相关性的K近邻多标签分类(CML-KNN)算法。首先,计算出标签集合中每对标签间的条件概率;其次,对于即将被预测的标签,将其与已经预测的标签间的条件概率进行排序,求出最大值;最后,将最大值跟对应标签值相乘同时结合最大化后验概率(MAP)来构造多标签分类模型,对新标签进行预测。实验结果表明,所提算法在Emotions数据集上的分类性能均优于ML-KNN、AdaboostMH、RAkEL、BPMLL这4种算法;在Yeast、Enron数据集上仅在1~2个评价指标上低于ML-KNN与RAkEL算法。由实验分析可知,该算法取得了较好的分类效果。

关键词: 标签相关性, K近邻多标签, 条件概率, 多标签分类

Abstract: Since the Multi-Label K Nearest Neighbor (ML-KNN) classification algorithm ignores the correlation between labels, a multi-label classification algorithm by exploiting label correlation named CML-KNN was proposed. Firstly, the conditional probability between each pair of labels was calculated. Secondly, the conditional probabilities of predicted labels and the conditional probability of the label to be predicted were ranked, then the maximum was got. Finally, a new classification model by combining Maximum A Posteriori (MAP) and the product of the maximum and its corresponding label value was proposed and the new label value was predicted. The experimental results show that the performance of CML-KNN on Emotions dataset outperforms the other four algorithms, namely ML-KNN, AdaboostMH, RAkEL, BPMLL, while only two evaluation metric values are lower than those of ML-KNN and RAkEL on Yeast and Enron datasets. The experimental analyses show that CML-KNN obtains better classification results.

Key words: label correlation, Multi-Label K Nearest Neighbor (ML-KNN), conditional probability, multi-label classification

中图分类号: