Journal of Computer Applications ›› 2020, Vol. 40 ›› Issue (2): 503-509.DOI: 10.11772/j.issn.1001-9081.2019091626
• CCF Bigdata 2019 • Previous Articles Next Articles
Yusheng CHENG1,2(), Fan SONG1, Yibin WANG1,2, Kun QIAN1
Received:
2019-08-30
Revised:
2019-09-24
Accepted:
2019-10-09
Online:
2019-10-31
Published:
2020-02-10
Contact:
Yusheng CHENG
About author:
SONG Fan, born in 1992, M. S. candidate. His research interests include multi-label learning, neural network.Supported by:
通讯作者:
程玉胜
作者简介:
宋帆(1992—),男,安徽铜陵人,硕士研究生,CCF会员,主要研究方向:多标记学习、神经网络基金资助:
CLC Number:
Yusheng CHENG, Fan SONG, Yibin WANG, Kun QIAN. Multi-label feature selection algorithm based on conditional mutual information of expert feature[J]. Journal of Computer Applications, 2020, 40(2): 503-509.
程玉胜, 宋帆, 王一宾, 钱坤. 基于专家特征的条件互信息多标记特征选择算法[J]. 《计算机应用》唯一官方网站, 2020, 40(2): 503-509.
Add to citation manager EndNote|Ris|BibTeX
URL: http://www.joca.cn/EN/10.11772/j.issn.1001-9081.2019091626
数据集 | 样本数 | 特征数 | 类别数 | 训练数 | 测试数 |
---|---|---|---|---|---|
Health | 5 000 | 612 | 32 | 2 000 | 3 000 |
Recreation | 5 000 | 606 | 22 | 2 000 | 3 000 |
Artificial | 5 000 | 462 | 26 | 2 000 | 3 000 |
Reference | 5 000 | 793 | 33 | 2 000 | 3 000 |
Entertainment | 5 000 | 640 | 21 | 2 000 | 3 000 |
Business | 5 000 | 438 | 30 | 2 000 | 3 000 |
Compute | 5 000 | 681 | 33 | 2 000 | 3 000 |
Tab. 1 Multi-label datasets
数据集 | 样本数 | 特征数 | 类别数 | 训练数 | 测试数 |
---|---|---|---|---|---|
Health | 5 000 | 612 | 32 | 2 000 | 3 000 |
Recreation | 5 000 | 606 | 22 | 2 000 | 3 000 |
Artificial | 5 000 | 462 | 26 | 2 000 | 3 000 |
Reference | 5 000 | 793 | 33 | 2 000 | 3 000 |
Entertainment | 5 000 | 640 | 21 | 2 000 | 3 000 |
Business | 5 000 | 438 | 30 | 2 000 | 3 000 |
Compute | 5 000 | 681 | 33 | 2 000 | 3 000 |
数据集 | Original | MDDMspc | MDDMproj | PMU | MFSLS | MFSEF |
---|---|---|---|---|---|---|
均值 | 0.625 8 | 0.611 7 | 0.607 9 | 0.610 8 | 0.645 8 | 0.649 2 |
Health | 0.681 2(3) | 0.679 4(4) | 0.651 6(6) | 0.670 9(5) | 0.723 7(2) | 0.728 0(1) |
Recreation | 0.454 7(4) | 0.449 7(5) | 0.462 8(3) | 0.444 1(6) | 0.510 2(2) | 0.522 5(1) |
Artificial | 0.509 4(3) | 0.497 4(4) | 0.484 8(6) | 0.490 9(5) | 0.536 3(2) | 0.536 4(1) |
Reference | 0.619 4(3) | 0.601 4(5) | 0.599 2(6) | 0.614 5(4) | 0.630 4(2) | 0.634 7(1) |
Entertainment | 0.602 3(3) | 0.551 3(6) | 0.558 8(5) | 0.567 1(4) | 0.603 2(2) | 0.604 0(1) |
Business | 0.879 8(1) | 0.870 7(5) | 0.873 1(4) | 0.862 8(6) | 0.876 2(3) | 0.876 5(2) |
Computer | 0.633 5(3) | 0.631 9(4) | 0.625 0(6) | 0.625 2(5) | 0.640 5(2) | 0.642 4(1) |
Tab. 2 AP(↑) results of each algorithm on 7 datasets
数据集 | Original | MDDMspc | MDDMproj | PMU | MFSLS | MFSEF |
---|---|---|---|---|---|---|
均值 | 0.625 8 | 0.611 7 | 0.607 9 | 0.610 8 | 0.645 8 | 0.649 2 |
Health | 0.681 2(3) | 0.679 4(4) | 0.651 6(6) | 0.670 9(5) | 0.723 7(2) | 0.728 0(1) |
Recreation | 0.454 7(4) | 0.449 7(5) | 0.462 8(3) | 0.444 1(6) | 0.510 2(2) | 0.522 5(1) |
Artificial | 0.509 4(3) | 0.497 4(4) | 0.484 8(6) | 0.490 9(5) | 0.536 3(2) | 0.536 4(1) |
Reference | 0.619 4(3) | 0.601 4(5) | 0.599 2(6) | 0.614 5(4) | 0.630 4(2) | 0.634 7(1) |
Entertainment | 0.602 3(3) | 0.551 3(6) | 0.558 8(5) | 0.567 1(4) | 0.603 2(2) | 0.604 0(1) |
Business | 0.879 8(1) | 0.870 7(5) | 0.873 1(4) | 0.862 8(6) | 0.876 2(3) | 0.876 5(2) |
Computer | 0.633 5(3) | 0.631 9(4) | 0.625 0(6) | 0.625 2(5) | 0.640 5(2) | 0.642 4(1) |
数据集 | Original | MDDMspc | MDDMproj | PMU | MFSLS | MFSEF |
---|---|---|---|---|---|---|
均值 | 0.047 1 | 0.047 5 | 0.047 3 | 0.047 0 | 0.045 4 | 0.044 6 |
Health | 0.045 8(5) | 0.043 8(4) | 0.046 2(6) | 0.043 5(3) | 0.041 0(2) | 0.038 6(1) |
Recreation | 0.061 8(3) | 0.063 3(5) | 0.061 9(4) | 0.063 7(6) | 0.059 8(2) | 0.058 8(1) |
Artificial | 0.061 2(4) | 0.061 6(6) | 0.060 9(3) | 0.061 5(5) | 0.058 7(1) | 0.059 4(2) |
Reference | 0.031 4(4) | 0.032 4(6) | 0.031 1(3) | 0.030 7(2) | 0.031 5(5) | 0.028 8(1) |
Entertainment | 0.061 2(4) | 0.062 4(6) | 0.062 0(5) | 0.060 7(3) | 0.059 4(2) | 0.059 1(1) |
Business | 0.026 9(1) | 0.028 0(4.5) | 0.028 0(4.5) | 0.028 5(6) | 0.027 4(3) | 0.027 2(2) |
Computer | 0.041 2(6) | 0.040 8(4.5) | 0.040 8(4.5) | 0.040 5(3) | 0.040 1(2) | 0.040 0(1) |
Tab. 3 HL(↓) results of each algorithm on 7 datasets
数据集 | Original | MDDMspc | MDDMproj | PMU | MFSLS | MFSEF |
---|---|---|---|---|---|---|
均值 | 0.047 1 | 0.047 5 | 0.047 3 | 0.047 0 | 0.045 4 | 0.044 6 |
Health | 0.045 8(5) | 0.043 8(4) | 0.046 2(6) | 0.043 5(3) | 0.041 0(2) | 0.038 6(1) |
Recreation | 0.061 8(3) | 0.063 3(5) | 0.061 9(4) | 0.063 7(6) | 0.059 8(2) | 0.058 8(1) |
Artificial | 0.061 2(4) | 0.061 6(6) | 0.060 9(3) | 0.061 5(5) | 0.058 7(1) | 0.059 4(2) |
Reference | 0.031 4(4) | 0.032 4(6) | 0.031 1(3) | 0.030 7(2) | 0.031 5(5) | 0.028 8(1) |
Entertainment | 0.061 2(4) | 0.062 4(6) | 0.062 0(5) | 0.060 7(3) | 0.059 4(2) | 0.059 1(1) |
Business | 0.026 9(1) | 0.028 0(4.5) | 0.028 0(4.5) | 0.028 5(6) | 0.027 4(3) | 0.027 2(2) |
Computer | 0.041 2(6) | 0.040 8(4.5) | 0.040 8(4.5) | 0.040 5(3) | 0.040 1(2) | 0.040 0(1) |
数据集 | Original | MDDMspc | MDDMproj | PMU | MFSLS | MFSEF |
---|---|---|---|---|---|---|
均值 | 0.105 8 | 0.108 9 | 0.110 7 | 0.108 1 | 0.101 9 | 0.100 6 |
Health | 0.060 5(3) | 0.065 2(5) | 0.069 9(6) | 0.063 5(4) | 0.056 3(1) | 0.056 5(2) |
Recreation | 0.191 4(5) | 0.189 2(3) | 0.192 4(6) | 0.189 5(4) | 0.177 0(2) | 0.173 0(1) |
Artificial | 0.152 0(3) | 0.153 9(5) | 0.157 6(6) | 0.153 0(4) | 0.146 8(2) | 0.145 7(1) |
Reference | 0.091 9(4) | 0.092 5(5) | 0.093 3(6) | 0.087 0(2) | 0.088 3(3) | 0.085 5(1) |
Entertainment | 0.115 4(3) | 0.126 4(6) | 0.125 8(5) | 0.122 6(4) | 0.115 0(2) | 0.113 1(1) |
Business | 0.037 4(1) | 0.043 3(5) | 0.041 6(4) | 0.045 9(6) | 0.040 2(2) | 0.040 7(3) |
Computer | 0.092 2(4) | 0.091 9(3) | 0.094 5(5) | 0.095 2(6) | 0.089 6(2) | 0.089 4(1) |
Tab. 4 RL(↓) results of each algorithm on 7 datasets
数据集 | Original | MDDMspc | MDDMproj | PMU | MFSLS | MFSEF |
---|---|---|---|---|---|---|
均值 | 0.105 8 | 0.108 9 | 0.110 7 | 0.108 1 | 0.101 9 | 0.100 6 |
Health | 0.060 5(3) | 0.065 2(5) | 0.069 9(6) | 0.063 5(4) | 0.056 3(1) | 0.056 5(2) |
Recreation | 0.191 4(5) | 0.189 2(3) | 0.192 4(6) | 0.189 5(4) | 0.177 0(2) | 0.173 0(1) |
Artificial | 0.152 0(3) | 0.153 9(5) | 0.157 6(6) | 0.153 0(4) | 0.146 8(2) | 0.145 7(1) |
Reference | 0.091 9(4) | 0.092 5(5) | 0.093 3(6) | 0.087 0(2) | 0.088 3(3) | 0.085 5(1) |
Entertainment | 0.115 4(3) | 0.126 4(6) | 0.125 8(5) | 0.122 6(4) | 0.115 0(2) | 0.113 1(1) |
Business | 0.037 4(1) | 0.043 3(5) | 0.041 6(4) | 0.045 9(6) | 0.040 2(2) | 0.040 7(3) |
Computer | 0.092 2(4) | 0.091 9(3) | 0.094 5(5) | 0.095 2(6) | 0.089 6(2) | 0.089 4(1) |
数据集 | Original | MDDMspc | MDDMproj | PMU | MFSLS | MFSEF |
---|---|---|---|---|---|---|
均值 | 0.474 3 | 0.491 4 | 0.496 8 | 0.495 5 | 0.441 5 | 0.438 9 |
Health | 0.420 7(4) | 0.405 3(3) | 0.446 3(6) | 0.431 3(5) | 0.347 7(2) | 0.340 0(1) |
Recreation | 0.706 3(4) | 0.714 3(5) | 0.688 3(3) | 0.719 0(6) | 0.616 3(2) | 0.608 3(1) |
Artificial | 0.632 7(3) | 0.647 0(4) | 0.667 0(6) | 0.657 0(5) | 0.584 0(2) | 0.583 7(1) |
Reference | 0.473 0(3) | 0.497 3(6) | 0.496 0(5) | 0.491 0(4) | 0.461 7(2) | 0.454 0(1) |
Entertainment | 0.529 7(2) | 0.602 0(6) | 0.599 7(5) | 0.584 7(4) | 0.525 7(1) | 0.533 0(3) |
Business | 0.121 3(1) | 0.128 3(5) | 0.126 3(4) | 0.136 0(6) | 0.122 7(2.5) | 0.122 7(2.5) |
Computer | 0.436 3(3) | 0.445 3(4) | 0.453 7(6) | 0.449 7(5) | 0.432 3(2) | 0.430 7(1) |
Tab. 5 OE(↓) results of each algorithm on 7 datasets
数据集 | Original | MDDMspc | MDDMproj | PMU | MFSLS | MFSEF |
---|---|---|---|---|---|---|
均值 | 0.474 3 | 0.491 4 | 0.496 8 | 0.495 5 | 0.441 5 | 0.438 9 |
Health | 0.420 7(4) | 0.405 3(3) | 0.446 3(6) | 0.431 3(5) | 0.347 7(2) | 0.340 0(1) |
Recreation | 0.706 3(4) | 0.714 3(5) | 0.688 3(3) | 0.719 0(6) | 0.616 3(2) | 0.608 3(1) |
Artificial | 0.632 7(3) | 0.647 0(4) | 0.667 0(6) | 0.657 0(5) | 0.584 0(2) | 0.583 7(1) |
Reference | 0.473 0(3) | 0.497 3(6) | 0.496 0(5) | 0.491 0(4) | 0.461 7(2) | 0.454 0(1) |
Entertainment | 0.529 7(2) | 0.602 0(6) | 0.599 7(5) | 0.584 7(4) | 0.525 7(1) | 0.533 0(3) |
Business | 0.121 3(1) | 0.128 3(5) | 0.126 3(4) | 0.136 0(6) | 0.122 7(2.5) | 0.122 7(2.5) |
Computer | 0.436 3(3) | 0.445 3(4) | 0.453 7(6) | 0.449 7(5) | 0.432 3(2) | 0.430 7(1) |
1 | GIBAJA E, VENTURA S. A tutorial on multilabel learning[J]. ACM Computing Surveys, 2015,47(3):1-38. 10.1145/2716262 |
2 | 何志芬,杨明,刘会东. 多标记分类和标记相关性的联合学习[J]. 软件学报, 2014, 25(9):1967-1981. |
HE Z F, YANG M, LIU H D. Joint learning of multi-label classification and label correlations[J]. Journal of Software, 2014, 25(9):1967-1981. | |
3 | WANG Z, CHEN T, LI G, et al. Multi-label image recognition by recurrently discovering attentional regions[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 464-472. 10.1109/iccv.2017.58 |
4 | OZONAT K, YOUNG D. Towards a universal marketplace over the web: statistical multi-label classification of service provider forms with simulated annealing[C]// Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2009:1295-1304. 10.1145/1557019.1557158 |
5 | SHU X, LAI D, XU H, et al. Learning shared subspace for multi-label dimensionality reduction via dependence maximization[J]. Neurocomputing, 2015, 168: 356-364. 10.1016/j.neucom.2015.05.090 |
6 | PEREIRA R B, PLASTINO A, ZADROZNY B, et al. Categorizing feature selection methods for multi-label classification[J]. Artificial Intelligence Review, 2018, 49(1): 57-78. 10.1007/s10462-016-9516-4 |
7 | ZHANG Y, ZHOU Z. Multilabel dimensionality reduction via dependence maximization[J]. ACM Transactions on Knowledge Discovery from Data, 2010, 4(3): No.14. 10.1145/1839490.1839495 |
8 | LEE J, KIM D W. Feature selection for multi-label classification using multivariate mutual information[J]. Pattern Recognition Letters, 2013, 34(3): 349-357. 10.1016/j.patrec.2012.10.005 |
9 | LIN Y, HU Q, LIU J, et al. Multi-label feature selection based on neighborhood mutual information[J]. Applied Soft Computing, 2016, 38: 244-256. 10.1016/j.asoc.2015.10.009 |
10 | 刘景华,林梦雷,王晨曦,等. 基于局部子空间的多标记特征选择算法[J]. 模式识别与人工智能, 2016, 29(3): 240-251. |
LIU J H, LIN M L, WANG C X, et al. Multi-label feature selection algorithm based on local subspace[J]. Pattern Recognition and Artificial Intelligence, 2016, 29(3): 240-251. | |
11 | 王晨曦,林耀进,唐莉,等. 基于信息粒化的多标记特征选择算法[J]. 模式识别与人工智能, 2018, 31(2): 123-131. 10.16451/j.cnki.issn1003-6059.201802003 |
WANG C X, LIN Y J, TANG L, et al. Multi-label feature selection based on information granulation[J]. Pattern Recognition and Artificial Intelligence, 2018, 31(2): 123-131. 10.16451/j.cnki.issn1003-6059.201802003 | |
12 | LEE J, LIM H, KIM D W. Approximating mutual information for multi-label feature selection[J]. Electronics Letters, 2012, 48(15): 929-930. 10.1049/el.2012.1600 |
13 | YU S, HUANG T. Exponential weighted entropy and exponential weighted mutual information[J]. Neurocomputing, 2017, 249: 86-94. 10.1016/j.neucom.2017.03.075 |
14 | LI F, MIAO D, PEDRYCZ W. Granular multi-label feature selection based on mutual information[J]. Pattern Recognition, 2017, 67: 410-423. 10.1016/j.patcog.2017.02.025 |
15 | ZHANG M, ZHOU Z. ML-kNN: a lazy learning approach to multi-label learning[J]. Pattern recognition, 2007, 40(7): 2038-2048. 10.1016/j.patcog.2006.12.019 |
16 | ZHANG M, ZHOU Z. A review on multi-label learning algorithms[J]. IEEE Transactions on Knowledge and Data Engineering, 2014, 26(8):1819-1837. 10.1109/tkde.2013.39 |
17 | FLEURET F. Fast binary feature selection with conditional mutual information[J]. Journal of Machine Learning Research, 2004, 5: 1531-1555. |
18 | 杨明,王飞. 一种基于局部随机子空间的分类集成算法[J]. 模式识别与人工智能, 2012, 25(4): 595-603. 10.3969/j.issn.1003-6059.2012.04.006 |
YANG M, WANG F. A classifier ensemble algorithm based on local random subspace [J]. Pattern Recognition and Artificial Intelligence, 2012, 25(4): 595-603. 10.3969/j.issn.1003-6059.2012.04.006 | |
19 | TSOUMAKAS G, VLAHAVAS I. Random k-labelsets: an ensemble method for multilabel classification[C]// Proceedings of the 18th European Conference on Machine Learning, LNCS4701. Berlin: Springer, 2007: 406-417. 10.1007/978-3-540-74958-5_38 |
20 | ZHANG M, PEÑA J M, ROBLES V. Feature selection for multi-label naive Bayes classification[J]. Information Sciences, 2009, 179(19): 3218-3229. 10.1016/j.ins.2009.06.010 |
21 | DEMŠAR J. Statistical comparisons of classifiers over multiple data sets[J]. Journal of Machine Learning Research, 2006, 7: 1-30. |
22 | ZHANG M, WU L. LIFT: multi-label learning with label-specific features[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(1): 107-120. 10.1109/tpami.2014.2339815 |
23 | 程玉胜,钱坤,王一宾,等.融合萤火虫方法的多标签懒惰学习算法[J].计算机应用,2019,39(5):1305-1311. 10.11772/j.issn.1001-9081.2018109182 |
CHENG Y S, QIAN K, WANG Y B, et al. Multi-label lazy learning algorithm based on firefly method[J]. Journal of Computer Applications, 2019, 39(5): 1305-1311. 10.11772/j.issn.1001-9081.2018109182 |
[1] | ZHAN Hang, HE Lang, HUANG Zhangcan, LI Huafeng, ZHANG Qiang, TAN Qing. Improved feature selection and classification algorithm for gene expression programming based on layer distance [J]. Journal of Computer Applications, 2021, 41(9): 2658-2667. |
[2] | ZHU Cheng, ZHAO Xiaoqi, ZHAO Liping, JIAO Yuhong, ZHU Yafei, CHENG Jianying, ZHOU Wei, TAN Ying. Classification of functional magnetic resonance imaging data based on semi-supervised feature selection by spectral clustering [J]. Journal of Computer Applications, 2021, 41(8): 2288-2293. |
[3] | LI Mengmeng, QIN Wei, LIU Yi, DIAO Xingchun. Hybrid ant colony optimization algorithm with brain storm optimization [J]. Journal of Computer Applications, 2021, 41(8): 2412-2417. |
[4] | LIN Junchao, WAN Yuan. Self-adaptive multi-measure unsupervised feature selection method with structured graph optimization [J]. Journal of Computer Applications, 2021, 41(5): 1282-1289. |
[5] | JIA Heming, JIANG Zichao, LI Yao, SUN Kangjian. Simultaneous feature selection optimization based on improved spotted hyena optimizer algorithm [J]. Journal of Computer Applications, 2021, 41(5): 1290-1298. |
[6] | ZHANG Zhihao, LIN Yaojin, LU Shun, GUO Chen, WANG Chenxi. Multi-label feature selection based on label-specific feature with missing labels [J]. Journal of Computer Applications, 2021, 41(10): 2849-2857. |
[7] | LIN Tengtao, ZHA Siming, CHEN Lei, LONG Xianzhong. Graph trend filtering guided noise tolerant multi-label learning model [J]. Journal of Computer Applications, 2021, 41(1): 8-14. |
[8] | GU Tong, XU Guoliang, LI Wanlin, LI Jiahao, WANG Zhiyuan, LUO Jiangtao. Intelligent house price evaluation model based on ensemble LightGBM and Bayesian optimization strategy [J]. Journal of Computer Applications, 2020, 40(9): 2762-2767. |
[9] | HUANG Xueyu, XU Haote, TAO Jianwen. Multi-source adaptation classification framework with feature selection [J]. Journal of Computer Applications, 2020, 40(9): 2499-2506. |
[10] | XIAO Yuelei, ZHANG Yunjiao. Terrorist attack organization prediction method based on feature selection and hyperparameter optimization [J]. Journal of Computer Applications, 2020, 40(8): 2262-2267. |
[11] | LIU Dan, YAO Lishuang, WANG Yunfeng, PEI Zuofei. Classification model for class imbalanced traffic data [J]. Journal of Computer Applications, 2020, 40(8): 2327-2333. |
[12] | WANG Zhiyuan, JIANG Ailian, MUHAMMAD Osman. Unsupervised feature selection method based on regularized mutual representation [J]. Journal of Computer Applications, 2020, 40(7): 1896-1900. |
[13] | CAO Jianfang, ZHAO Aidi, ZHANG Zibang. Application of convolutional neural network with threshold optimization in image annotation [J]. Journal of Computer Applications, 2020, 40(6): 1587-1592. |
[14] | CAO Yu, WANG Cheng, WANG Xin, GAO Yueer. Urban road short-term traffic flow prediction based on spatio-temporal node selection and deep learning [J]. Journal of Computer Applications, 2020, 40(5): 1488-1493. |
[15] | XIE Qi, XU Xu, CHENG Gengguo, CHEN Heping. Feature selection algorithm based on new forest optimization algorithm [J]. Journal of Computer Applications, 2020, 40(5): 1266-1271. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||