计算机应用 ›› 2015, Vol. 35 ›› Issue (3): 659-662.DOI: 10.11772/j.issn.1001-9081.2015.03.659

• 先进计算 • 上一篇    下一篇

基于联合概率的多标签分类算法

何朋, 周丽娟   

  1. 首都师范大学 信息工程学院, 北京 100048
  • 收稿日期:2014-10-17 修回日期:2014-11-26 出版日期:2015-03-10 发布日期:2015-03-13
  • 通讯作者: 何朋
  • 作者简介:何朋(1989-),男,四川广元人,硕士研究生,主要研究方向:数据挖掘、云计算、大数据;周丽娟(1969-),女,黑龙江哈尔滨人,教授,博士,主要研究方向:数据仓库、数据挖掘、知识发现
  • 基金资助:

    国家科技支撑计划项目(2013BAH19F01)

Multi-label classification algorithm based on joint probability

HE Peng, ZHOU Lijuan   

  1. Information Engineering College, Capital Normal University, Beijing 100048, China
  • Received:2014-10-17 Revised:2014-11-26 Online:2015-03-10 Published:2015-03-13

摘要:

针对多标签k邻域(ML-kNN)算法忽略了多个标签间可能存在的相关性的问题,提出了一种基于联合概率的RML-kNN多标签分类算法。首先,在样本空间遍历求得每个标签的先验概率;其次,根据样本k邻域内某个标签的概率分布计算在该标签取值的条件下样本k邻域内有m个该标签出现的条件概率;然后,提出使用多个标签在k邻域的联合概率分布作为多标签分类模型的方法,并在样本空间进行计算;最后,以最大化后验概率的方法推导出RML-kNN多标签分类模型。理论分析和实验论证表明,在SubSet Accuracy上最高达到0.9612,相比ML-kNN最多有2.25%的提升;在Hamming Loss上比RM-kNN有明显降低,最低达到0.0022;在Micro-FMeasure上最高可达到0.9767,相比ML-kNN最高可有2.88%的提升。实验结果表明,RML-kNN充分考虑了标签间相关性,分类效果优于ML-kNN算法。

关键词: 多标签, 相关性, 联合概率, k邻域

Abstract:

Since the Multi-Label k Nearest Neighbor (ML-kNN) algorithm ignores the correlation between labels, a multi-label classification algorithm based on joint probability was proposed. Firstly, priori probability was calculated during traversing the sample space; Secondly, conditional probability of a label appeared m times in kNN when it got value 1 or 0 was computed; Then, the method of using label joint probability distribution, which was computed during traversing the sample space, as multi-label classification model was proposed. Finally, the multi-label classification model of coRrelation Multi-Label-kNN (RML-kNN) was deduced by way of maximizing the posterior probability. The theoretical analysis and comparison experiments on several datasets show that RML-kNN elevates Subset Accuracy to 0.9612 in the best case, which gains 2.25% promotion compared with ML-kNN; RML-kNN, which gains significant reduction on Hamming Loss, gets a minimum value of 0.0022; Micro-FMeasure can be elevated up to 0.9767, in comparison of ML-kNN, RML-kNN gets 2.88% elevation in the best case. The experimental results show that RML-kNN outperforms ML-kNN as it integrates correlation between labels during classification process.

Key words: multi-label, correlation, joint probability, k Nearest Neighborhood (kNN)

中图分类号: