Journal of Computer Applications ›› 2015, Vol. 35 ›› Issue (1): 147-151.DOI: 10.11772/j.issn.1001-9081.2015.01.0147

Previous Articles     Next Articles

Multi-label classification algorithm based on floating threshold classifiers combination

ZHANG Danpu1,2, FU Zhongliang1, WANG Lili1,2, LI Xin1,2   

  1. 1. Chengdu Institute of Computer Application, Chinese Academy of Sciences, Chengdu Sichuan 610041, China;
    2. University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2014-08-01 Revised:2014-09-19 Online:2015-01-01 Published:2015-01-26

基于浮动阈值分类器组合的多标签分类算法

张丹普1,2, 付忠良1, 王莉莉1,2, 李昕1,2   

  1. 1. 中国科学院 成都计算机应用研究所, 成都610041;
    2. 中国科学院大学, 北京100049
  • 通讯作者: 张丹普
  • 作者简介:张丹普(1986-),女,河南平顶山人,博士研究生,主要研究方向:机器学习、模式识别;付忠良(1967-),男,重庆合川人,研究员,博士生导师,主要研究方向:机器学习、模式识别;王莉莉(1987-),女,河南周口人,博士研究生,主要研究方向:机器学习、模式识别;李昕(1985-),男,陕西汉中人,博士研究生,主要研究方向:图形图像处理、模式识别.
  • 基金资助:

    四川省科技支撑计划项目(2011GZ0171; 2012GZ0106).

Abstract:

To solve the multi-label classification problem that a target belongs to multiple classes, a new multi-label classification algorithm based on floating threshold classifiers combination was proposed. Firstly, the theory and error estimation of the AdaBoost algorithm with floating threshold (AdaBoost.FT) were analyzed and discussed, and it was proved that AdaBoost.FT algorithm could overcome the defect of unstabitily when the fixed segmentation threshold classifier was used to classify the points near classifying boundary, the classification accuracy of single-label classification algorithm was improved. And then, the Binary Relevance (BR) method was introduced to apply AdaBoost.FT algorithm into multi-label classification problem, and the multi-label classification algorithm based on floating threshold classifiers combination was presented, namely multi-label AdaBoost.FT. The experimental results show that the average precision of multi-label AdaBoost. FT outperforms the other three multi-label algorithms, AdaBoost.MH (multiclass, multi-label version of AdaBoost based on Hamming loss), ML-kNN (Multi-Label k-Nearest Neighbor), RankSVM (Ranking Support Vector Machine) about 4%, 8%, 11% respectively in Emotions dataset, and is just little worse than RankSVM about 3%, 1% respectively in Scene and Yeast datasets. The experimental analyses show that multi-label AdaBoost. FT can obtain the better classification results in the datasets which have small number of labels or whose different labels are irrelevant.

Key words: real AdaBoost, floating threshold, maximum likelihood principle, multi-label classification, ensemble learning, Binary Relevance (BR) method

摘要:

针对目标可以同时属于多个类别的多标签分类问题,提出了一种基于浮动阈值分类器组合的多标签分类算法.首先,分析探讨了基于浮动阈值分类器的AdaBoost算法(AdaBoost.FT)的原理及错误率估计,证明了该算法能克服固定分段阈值分类器对分类边界附近点分类不稳定的缺点从而提高分类准确率;然后,采用二分类(BR)方法将该单标签学习算法应用于多标签分类问题,得到基于浮动阈值分类器组合的多标签分类方法,即多标签AdaBoost.FT.实验结果表明,所提算法的平均分类精度在Emotions数据集上比AdaBoost.MH、ML-kNN、RankSVM这3种算法分别提高约4%、8%、11%;在Scene、Yeast数据集上仅比RankSVM低约3%、1%.由实验分析可知,在不同类别标记之间基本没有关联关系或标签数目较少的数据集上,该算法均能得到较好的分类效果.

关键词: 连续AdaBoost, 浮动阈值, 极大似然原理, 多标签分类, 集成学习, 二分类方法

CLC Number: