基于贝叶斯模型的多标签分类算法

doi:10.11772/j.issn.1001-9081.2016.01.0052

计算机应用 ›› 2016, Vol. 36 ›› Issue (1): 52-56.DOI: 10.11772/j.issn.1001-9081.2016.01.0052

• 第32届中国数据库学术会议(NDBC 2015) • 上一篇下一篇

基于贝叶斯模型的多标签分类算法

张洛阳¹, 毛嘉莉^1,2, 刘斌¹, 吴涛¹

1. 西华师范大学计算机学院, 四川南充 637002;
2. 华东师范大学软件学院, 上海 200241

收稿日期:2015-09-05 修回日期:2015-09-25 出版日期:2016-01-10 发布日期:2016-01-09
通讯作者: 毛嘉莉(1979-),女,四川南充人,副教授,博士研究生,主要研究方向:机器学习、模式识别
作者简介:张洛阳(1990-),男,江苏徐州人,硕士研究生,主要研究方向:机器学习、模式识别;刘斌(1991-),男,河南南阳人,硕士研究生,主要研究方向:机器学习、模式识别;吴涛(1991-),男,四川资阳人,硕士研究生,主要研究方向:机器学习、模式识别。
基金资助:
四川省自然科学基金资助项目(14ZB0140)。

Multi-label classification algorithm based on Bayesian model

ZHANG Luoyang¹, MAO Jiali^1,2, LIU Bin¹, WU Tao¹

1. College of Computer, China West Normal University, Nanchong Sichuan 637002, China;
2. College of Software, East China Normal University, Shanghai 200241, China

Received:2015-09-05 Revised:2015-09-25 Online:2016-01-10 Published:2016-01-09
Supported by:
This work is partially supported by Natural Science Foundation of Sichuan Province (14ZB0140).

摘要/Abstract

摘要： 针对二元关联法(BR)未考虑标签之间相关性,容易造成分类器输出在训练集中不存在或次数较少标签的不足,提出了基于贝叶斯模型的多标签分类算法(MLBM)和马尔可夫型多标签分类算法(MMLBM)。首先,建立仿真模型分析BR算法的不足,考虑到标签的取值应由属性置信度和标签置信度共同决定,提出MLBM。其中,通过传统的分类算法计算获得属性置信度,以及通过训练集得到标签置信度。然后,考虑到MLBM在计算属性置信度时必须考虑所有已分类的标签,分类器的性能容易受无关或弱关系的标签影响,所以使用马尔可夫模型简化置信度的计算提出了MMLBM。理论分析和仿真实验表明,与BR算法相比,MMLBM的平均分类精度在emotions数据集上提高约4.8%,在yeast数据集上提高约9.8%,在flags数据集上提高约7.3%。实验结果表明,当数据集中实例的标签基数较大时,相对于BR算法,MMLBM的准确性有较大的提升。

关键词: 多标签, 贝叶斯模型, 马尔可夫模型, K近邻, 置信度

Abstract: Since the relation of labels in Binary Relevance (BR) is ignored, it is easy to cause the multi-label classifier to output not exist or less emergent labels in training data. The Multi-Label classification algorithm based on Bayesian Model (MLBM) and Markov Multi-Label classification algorithm based on Bayesian Model (MMLBM) were proposed. Firstly, to analyze the shortcomings of BR algorithm, the simulation model was established; considering the value of label should be decided by the attribute confidence and label confidence, MLBM was proposed. Particularly, the attribute confidence was calculated by traditional classification and the label confidence was obtained directly from the training data. Secondly, when MLBM calculated label confidence, it had to consider all the classified labels, thus some of no-relation or weak-relation labels would affect performance of the classifier. To overcome the weakness of MLBM, MMLBM was proposed, which used Markov model to simplify the calculation of label confidence. The theoretical analyses and simulation experiment results demonstrate that, in comparison with BR algorithm, the average classification accuracy of MMLBM increased by 4.8% on emotions dataset, 9.8% on yeast dataset and 7.3% on flags dataset. The experimental results show that MMLBM can effectively improve the classification accuracy when the label cardinality is larger in the training data.

Key words: multi-label, Bayesian model, Markov model, K Nearest Neighbor (KNN), confidence

中图分类号:

TP181

张洛阳, 毛嘉莉, 刘斌, 吴涛. 基于贝叶斯模型的多标签分类算法[J]. 计算机应用, 2016, 36(1): 52-56.

ZHANG Luoyang, MAO Jiali, LIU Bin, WU Tao. Multi-label classification algorithm based on Bayesian model[J]. Journal of Computer Applications, 2016, 36(1): 52-56.

参考文献

[1] ZHANG M, ZHOU Z. A review on multi-label learning algorithms [J]. IEEE transactions on knowledge and data engineering, 2014, 26(8): 1819-1837.
[2] READ J. A pruned problem transformation method for multi-label classification [C]// Proceedings of the 2008 New Zealand Computer Science Research Student Conference. Hamilton, New Zealand: [s.n.], 2008: 143-150.
[3] TSOUMAKAS G, KATAKIS I, VLAHAVAS I. Random k-labelsets for multilabel classification [J]. IEEE transactions on knowledge and data engineering, 2011, 23(7): 1079-1089.
[4] READ J, PFAHRINGER B, HOLMES G, et al. Classifier chains for multi-label classification [J]. Machine learning, 2011, 85(3): 333-359.
[5] READ J, PFAHRINGER B, HOLMES G, et al. Classifiers chains for multi-label classification [C]// Proceedings of the 2009 European Conference on Machine Learning and Knowledge Discovery in Databases. Berlin: Springer, 2009: 254-269.
[6] CHENG W, HVLLERMEIER E, DEMBCZYNSKI K J. An analysis of chaining in multi-label classification [C]// Proceedings of the 20th European Conference on Artificial Intelligence. Amsterdam: IOS Press, 2012: 294-299.
[7] CHENG W, HüLLERMEIER E, DEMBCZYNSKI K J. Bayes optimal multilabel classification via probabilistic classifier chains [C]// Proceedings of the 27th International Conference on Machine Learning. New York: ACM, 2010: 279-286.
[8] SUCAR L E, BIELZA C, MORALES E F, et al. Multi-label classification with Bayesian network-based chain classifiers [J]. Pattern recognition letters, 2014, 41(9):12-22.
[9] YU Y, PEDRYCZ W, MIAO D. Multi-label classification by exploiting label correlations [J]. Expert systems with applications, 2014, 41(6): 2989-3004.
[10] ZHANG M L, ZHOU Z H. ML-KNN: a lazy learning approach to multilabel learning [J]. Pattern recognition, 2007, 40(7): 2038-2048.
[11] ZHANG M L, ZHOU Z H. Multilabel neural networks with applications to functional genomics and text categorization [J]. IEEE transactions on knowledge and data engineering, 2006, 18(10): 1338-1351.
[12] NAM J, KIM J, MENCIA E L, et al. Large-scale multi-label text classification-revisiting neural networks [C]// Proceedings of the 2014 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Berlin: Springer, 2014: 437-452.
[13] SCHAPIRE R E, SINGER Y. BoosTexter: a boosting-based sys-tem for text categorization [J]. Machine learning, 2000, 39(2): 135-168.
[14] SPYROMITORS E, TSOUMAKES G, VLAHAVAS I. An empirical study of lazy multilabel classification algorithm [C]// SETN'08: Proceedings of the 5th Hellenic Conference on Artificial Intelligence: Theories, Models and Applications. Berlin: Springer, 2008: 401-406.
[15] XU C, MADDAGE M C, SHAO X. Automatic music classification and summarization [J]. IEEE transactions on speech and audio processing, 2005, 13(3): 441-450.
[16] BOUTELL M R, LUO J, SHEN X, et al. Learning multi-label scene classification [J]. Pattern recognition, 2004, 37(9): 1757-1771.
[17] ELISSEEFF A, WESTON J. A kernel method for multi-labelled classification [C]// NIPS 2001: Proceedings of the 2001 Conference on Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press, 2001: 681-687.
[18] CORREA GONCALVES E, PLASTINO A, FREITAS A A, et al. A genetic algorithm for optimizing the label ordering in multi-label classifier chains [C]// Proceedings of 2013 IEEE 25th International Conference on Tools with Artificial Intelligence. Piscataway, NJ: IEEE, 2013: 469-476.
[19] TSOUMAKAS G, KATAKIS I, VLAHAVAS I. Mining multi-label data [M]// Data mining and knowledge discovery handbook. Berlin: Springer, 2010: 667-686.

基于贝叶斯模型的多标签分类算法

Multi-label classification algorithm based on Bayesian model

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	张豪, 朱睿, 宋栿尧, 方鹏, 夏秀峰. 距离-关键字相似度约束的双色反k近邻查询方法[J]. 计算机应用, 2021, 41(6): 1686-1693.
[2]	闫钧华, 侯平, 张寅, 吕向阳, 马越, 王高飞. 基于多尺度多分类器卷积神经网络的混合失真类型判定方法[J]. 《计算机应用》唯一官方网站, 2021, 41(11): 3178-3184.
[3]	张增辉, 姜高霞, 王文剑. 基于局部概率抽样的标签噪声过滤方法[J]. 计算机应用, 2021, 41(1): 67-73.
[4]	王敏蕊, 高曙, 袁自勇, 袁蕾. 基于动态路由序列生成模型的多标签文本分类方法[J]. 计算机应用, 2020, 40(7): 1884-1890.
[5]	曹建芳, 赵爱迪, 张自邦. 融合阈值寻优的卷积神经网络在图像标注中的应用[J]. 计算机应用, 2020, 40(6): 1587-1592.
[6]	霍纬纲, 王慧芳. 基于自编码器和隐马尔可夫模型的时间序列异常检测方法[J]. 计算机应用, 2020, 40(5): 1329-1334.
[7]	吴小莉, 郑艺峰. 基于K近邻算法的噪声种类识别和强度估计[J]. 计算机应用, 2020, 40(1): 264-270.
[8]	尹玉, 詹永照, 姜震. 伪标签置信选择的半监督集成学习视频语义检测[J]. 计算机应用, 2019, 39(8): 2204-2209.
[9]	马兰, 崔博花, 刘轩, 岳猛, 吴志军. 基于隐半马尔可夫模型的SWIM应用层DDoS攻击的检测方法[J]. 计算机应用, 2019, 39(7): 1973-1978.
[10]	程玉胜, 钱坤, 王一宾, 赵大卫. 融合萤火虫方法的多标签懒惰学习算法[J]. 计算机应用, 2019, 39(5): 1305-1311.
[11]	王金策, 邓越萍, 史明, 周云飞. 多时间尺度时间序列趋势预测[J]. 计算机应用, 2019, 39(4): 1046-1052.
[12]	杨春德, 刘京, 瞿中. 分离窗口快速尺度自适应目标跟踪算法[J]. 计算机应用, 2019, 39(4): 1145-1149.
[13]	郭良敏, 朱莹, 孙丽萍. 障碍空间中基于并行蚁群算法的k近邻查询[J]. 计算机应用, 2019, 39(3): 790-795.
[14]	杨世强, 罗晓宇, 乔丹, 柳培蕾, 李德信. 基于滑动窗口和动态规划的连续动作分割与识别[J]. 计算机应用, 2019, 39(2): 348-353.
[15]	许瀚, 罗亮, 孙鹏, 孟飒. 基于马尔可夫模型的云系统安全性与性能建模[J]. 计算机应用, 2019, 39(11): 3304-3309.