Improved MIMLBoost algorithm based on importance evaluation of labels

doi:10.11772/j.issn.1001-9081.2015.11.3122

Journal of Computer Applications ›› 2015, Vol. 35 ›› Issue (11): 3122-3125.DOI: 10.11772/j.issn.1001-9081.2015.11.3122

• DPCS 2015 Paper • Previous Articles Next Articles

Improved MIMLBoost algorithm based on importance evaluation of labels

HAO Ning¹, XIA Shixiong¹, NIU Qiang¹, ZHAO Zhijun²

1. School of Computer Science and Technology, China University of Mining and Technology, Xuzhou Jiangsu 221116, China;
2. Ministry of Transport of Dinghai District, Zhoushan Zhejiang 316000, China

Received:2015-06-17 Revised:2015-07-09 Online:2015-11-13

基于类别重要度的MIMLBoost改进算法

郝宁¹, 夏士雄¹, 牛强¹, 赵志军²

1. 中国矿业大学计算机科学与技术学院, 江苏徐州 221116;
2. 舟山市定海区交通建设事务中心, 浙江舟山 316000

通讯作者: 夏士雄(1961-),男,辽宁抚顺人,教授,博士,主要研究方向:智能控制、数据挖掘、工业通信网络.
作者简介:郝宁(1990-),男,江苏徐州人,硕士研究生,主要研究方向:人工智能、智能信息处理; 牛强(1974-),男,河南南阳人,教授,博士,主要研究方向:数据挖掘、智能优化算法、智能信息处理; 赵志军(1966-),男,浙江余姚人,政工师,主要研究方向:人工智能、传感器网络.
基金资助:
江苏省产学研联合创新资金前瞻性联合研究项目(BY2014028-09);国家海洋局数字海洋科学技术重点实验室开放基金资助项目(KLDO201304);浙江省交通运输厅科研计划项目(2014T25).

Abstract

Abstract: In order to solve the problem of class imbalance which the original degradation method causes in MIMLBoost algorithm, this paper introduced the importance of class into the original algorithm and an improved degradation method based on the category tag evaluating was proposed. First of all, the proposed method used a clustering algorithm to cluster all bags into groups. Each group could be treated as a concept in the multi-instance bag, and every class label could be quantified in each group. Then, the TF-IDF(Term Frequency-Inverse Document Frequency) algorithm was used to get the importance of each label in each group. Finally, for each group, the label whose importance was lowest in the group could be removed, because this label created many negative samples easily when the MIML (Multi-Instance Multi-Label) samples were transformed into multi-instance samples. The experimental results show that the new degradation method is effective, and the performance of improved algorithm is better than the original algorithm, especially in the terms of Hamming loss, coverage and ranking loss. This confirms that the new algorithm can reduce the error rate of classification and improve the precision of algorithm effectively.

Key words: Multi-Instance Multi-Label (MIML), MIMLBoost algorithm, Term Frequency-Inverse Document Frequency (TF-IDF) algorithm, clustering, class imbalance

摘要： 针对多示例多标记学习算法MIMLBoost中退化过程造成的类别不平衡问题,运用人工降采样思想,引入类别重要度,提出一种改进的基于类别标记评估的退化方法.该方法通过对示例空间中的示例包进行聚类,把标记空间中的标记量化到聚类簇上,再以聚类簇为单位,利用TF-IDF算法对每个类别标记进行重要度评估和筛选,去除重要度低的标记,并将簇中的示例包与其余的类别标记拼接起来,以此来减少大类样本的出现,完成多示例多标记样本向多示例单标记样本的转化.在自然数据集上进行了实验,实验结果发现,改进算法的性能整体上优于原算法,尤其在Hamming loss、coverage、ranking loss三个评测指标上尤为明显,说明所提算法能够有效降低分类的出错率,提高算法的精度和分类效率.

关键词: 多示例多标记, MIMIBoost算法, TF-IDF算法, 聚类, 类别不平衡

CLC Number:

TP181

HAO Ning, XIA Shixiong, NIU Qiang, ZHAO Zhijun. Improved MIMLBoost algorithm based on importance evaluation of labels[J]. Journal of Computer Applications, 2015, 35(11): 3122-3125.

郝宁, 夏士雄, 牛强, 赵志军. 基于类别重要度的MIMLBoost改进算法[J]. 计算机应用, 2015, 35(11): 3122-3125.

References

[1] ZHOU Z, ZHANG M. Multi-instance multi-label learning with application to scene classification[C]// Proceedings of the 2006 Conference Advances in Neural Information Processing Systems. Cambridge: MIT Press, 2007: 1609-1616.
[2] ZHOU Z, ZHANG M, HUANG S, et al. Multi-instance multi-label learning[J]. Artificial Intelligence, 2012, 176(1): 2291-2320.
[3] HOMAN P, RALPH M A L, ROGERS T T. Semantic diversity: a measure of semantic ambiguity based on variability in the contextual usage of words[J]. Behavior Research Methods, 2013, 45(3):718-730.
[4] ZHANG M, ZHOU Z. Multi-label learning by instance differentiation[C]// Proceedings of the 22nd Conference on Articial Intelligence. Menlo Park: AAAI Press, 2007: 669-674.
[5] ZHANG D, HE J, LAWRENCE R. MI2LS: multi-instance learning from multiple information sources[C]// Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2013: 149-157.
[6] ZHANG M, ZHOU Z. M³MIML: a maximum margin method for multi-instance multi-label learning[C]// Proceedings of the 8th IEEE International Conference on Data Mining. Piscataway: IEEE, 2008: 688-697.
[7] TSOUMAKAS G, ZHANG M, ZHOU Z. Introduction to the special issue on learning from multi-label data[J]. Machine Learning, 2012, 88(1/2): 1-4.
[8] XU X, FRANK E. Logistic regression and boosting for labeled bags of instances[C]// Proceedings of the 8th Pacific-Asia Conference Advances in Knowledge Discovery and Data Mining. Berlin: Springer, 2011: 272-281.
[9] GALAR M, FERNANDEZ A, BARRENECHEA E. A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches [J]. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 2012, 42(4): 463-484.
[10] HUANG C, YIN J, HOU F. A text similarity measurement combining word semantic information with TF-IDF method[J]. Chinese Journal of Computers, 2011, 34(5): 856-864.(黄承慧,印鉴,侯昉.一种结合词项语义信息和TF-IDF方法的文本相似度量方法[J].计算机学报, 2011, 34(5): 856-864.)
[11] QU X, CHEN Y, QIAO S, et al. Predicting the subcellular localization of proteins with multiple sites based on multiple features fusion[C]// Proceedings of the 10th International Conference Intelligent Computing in Bioinformatics. Berlin: Springer, 2014: 456-465.
[12] YERPUDE A, DUBEY S. Colour image segmentation using K-medoids clustering [J]. International Journal of Computer Technology and Applications, 2012, 3(1): 152-154.
[13] BENBOUZID D, BUSA-FEKETE R, CASAGRANDE N, et al. MultiBoost: a multi-purpose boosting package [J]. The Journal of Machine Learning Research, 2012, 13(1): 549-553.

[1]	Shunyong LI, Shiyi LI, Rui XU, Xingwang ZHAO. Incomplete multi-view clustering algorithm based on self-attention fusion [J]. Journal of Computer Applications, 2024, 44(9): 2696-2703.
[2]	Qing WANG, Jieyu ZHAO, Xulun YE, Nongxiao WANG. Enhanced deep subspace clustering method with unified framework [J]. Journal of Computer Applications, 2024, 44(7): 1995-2003.
[3]	Yao DONG, Yixue FU, Yongfeng DONG, Jin SHI, Chen CHEN. Survey of incomplete multi-view clustering [J]. Journal of Computer Applications, 2024, 44(6): 1673-1682.
[4]	Xiaoxia JIANG, Ruizhang HUANG, Ruina BAI, Lina REN, Yanping CHEN. Deep event clustering method based on event representation and contrastive learning [J]. Journal of Computer Applications, 2024, 44(6): 1734-1742.
[5]	Tianyu HUANG, Yuanxing LI, Hao CHEN, Zijia GUO, Mingjun WEI. User cluster partitioning method based on weighted fuzzy clustering in ground-air collaboration scenarios [J]. Journal of Computer Applications, 2024, 44(5): 1555-1561.
[6]	Tongtong XU, Bin XIE, Chunhao ZHANG, Ximei ZHANG. Multi-order nearest neighbor graph clustering algorithm by fusing transition probability matrix [J]. Journal of Computer Applications, 2024, 44(5): 1527-1538.
[7]	Lin GAO, Yu ZHOU, Tak Wu KWONG. Evolutionary bi-level adaptive local feature selection [J]. Journal of Computer Applications, 2024, 44(5): 1408-1414.
[8]	Yu DING, Hanlin ZHANG, Rong LUO, Hua MENG. Fuzzy clustering algorithm based on belief subcluster cutting [J]. Journal of Computer Applications, 2024, 44(4): 1128-1138.
[9]	Long CHEN, Xuanlin YU, Wen CHEN, Yi YAO, Wenjing ZHU, Ying JIA, Denghong LI, Zhi REN. Efficient clustered routing protocol for intelligent road cone ad-hoc networks based on non-random clustering [J]. Journal of Computer Applications, 2024, 44(3): 869-875.
[10]	Lin SUN, Menghan LIU. K-means clustering based on adaptive cuckoo optimization feature selection [J]. Journal of Computer Applications, 2024, 44(3): 831-841.
[11]	Zhuo ZHANG, Huazhu CHEN. Deep subspace clustering based on multiscale self-representation learning with consistency and diversity [J]. Journal of Computer Applications, 2024, 44(2): 353-359.
[12]	Chenghao YANG, Jie HU, Hongjun WANG, Bo PENG. Incomplete multi-view clustering algorithm based on attention mechanism [J]. Journal of Computer Applications, 2024, 44(12): 3784-3789.
[13]	Chunyong YIN, Yongcheng ZHOU. Automatically adjusted clustered federated learning for double-ended clustering [J]. Journal of Computer Applications, 2024, 44(10): 3011-3020.
[14]	Yunhua ZHU, Bing KONG, Lihua ZHOU, Hongmei CHEN, Chongming BAO. Multi-view clustering network guided by graph contrastive learning [J]. Journal of Computer Applications, 2024, 44(10): 3267-3274.
[15]	Xueran XU, Geng YANG, Yuxian HUANG. Differential privacy clustering algorithm in horizontal federated learning [J]. Journal of Computer Applications, 2024, 44(1): 217-222.

Improved MIMLBoost algorithm based on importance evaluation of labels

基于类别重要度的MIMLBoost改进算法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics