基于Fisher score与模糊邻域熵的多标记特征选择算法

doi:10.11772/j.issn.1001-9081.2022121841

《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (12): 3779-3789.DOI: 10.11772/j.issn.1001-9081.2022121841

基于Fisher score与模糊邻域熵的多标记特征选择算法

孙林¹(), 马天娇², 薛占熬²^,³

^1.天津科技大学人工智能学院，天津 300457
^2.河南师范大学计算机与信息工程学院，河南新乡 453007
^3.智慧商务与物联网技术河南省工程实验室（河南师范大学），河南新乡 453007

收稿日期:2022-12-09 修回日期:2023-01-29 接受日期:2023-01-31 发布日期:2023-02-17 出版日期:2023-12-10
通讯作者: 孙林
作者简介:马天娇（1998—），女，河南信阳人，硕士研究生，主要研究方向：多标记学习
薛占熬（1963—），男，河南三门峡人，教授，博士，CCF高级会员，主要研究方向：粒计算、三支决策。
基金资助:
国家自然科学基金资助项目(62076089)

Multilabel feature selection algorithm based on Fisher score and fuzzy neighborhood entropy

Lin SUN¹(), Tianjiao MA², Zhan’ao XUE²^,³

^1.College of Artificial Intelligence，Tianjin University of Science & Technology，Tianjin 300457，China
^2.College of Computer and Information Engineering，Henan Normal University，Xinxiang Henan 453007，China
^3.Engineering Lab of Intelligence Business & Internet of Things of Henan Province （Henan Normal University），Xinxiang Henan 453007，China

Received:2022-12-09 Revised:2023-01-29 Accepted:2023-01-31 Online:2023-02-17 Published:2023-12-10
Contact: Lin SUN
About author:MA Tianjiao， born in 1998， M. S. candidate. Her research interests include multilabel learning.
XUE Zhan’ao， born in 1963， Ph. D.， professor. His research interests include granular computing， three-way decision.
Supported by:
National Natural Science Foundation of China(62076089)

摘要/Abstract

摘要：

针对Fisher score未充分考虑特征与标记以及标记之间的相关性，以及一些邻域粗糙集模型容易忽略边界域中知识粒的不确定性，导致算法分类性能偏低等问题，提出一种基于Fisher score与模糊邻域熵的多标记特征选择算法（MLFSF）。首先，利用最大信息系数（MIC）衡量特征与标记之间的关联程度，构建特征与标记关系矩阵；基于修正余弦相似度定义标记关系矩阵，分析标记之间的相关性。其次，给出一种二阶策略获得多个二阶标记关系组，以此重新划分多标记论域；通过增强标记之间的强相关性和削弱标记之间的弱相关性得到每个特征的得分，进而改进Fisher score模型，对多标记数据进行预处理。再次，引入多标记分类间隔，定义自适应邻域半径和邻域类并构造了上、下近似集；在此基础上提出了多标记粗糙隶属度函数，将多标记邻域粗糙集映射到模糊集，基于多标记模糊邻域给出了上、下近似集以及多标记模糊邻域粗糙集模型，由此定义模糊邻域熵和多标记模糊邻域熵，有效度量边界域的不确定性。最后，设计基于二阶标记相关性的多标记Fisher score特征选择算法（MFSLC），从而构建MLFSF。在多标记K近邻（MLKNN）分类器下11个多标记数据集上的实验结果表明，相较于ReliefF多标记特征选择（MFSR）等6种先进算法，MLFSF的平均分类精度（AP）的均值提高了2.47~6.66个百分点；同时，在多数数据集上，MLFSF在5个评价指标上均能取得最优值。

关键词: 多标记学习, 特征选择, Fisher score, 多标记模糊邻域粗糙集, 模糊邻域熵

Abstract:

For that Fisher score model does not fully consider feature-label and label-label relations， and some neighborhood rough set models easily neglect the uncertainty of knowledge granulations in the boundary region， resulting in the low classification performance of these algorithms， a MultiLabel feature selection algorithm based on Fisher Score and Fuzzy neighborhood entropy （MLFSF） was proposed. Firstly， by using the Maximum Information Coefficient （MIC） to evaluate the feature-label association degree， the relationship matrix between features and labels was constructed， and the correlation between labels was analyzed by the relationship matrix of labels based on the adjusted cosine similarity. Secondly， a second-order strategy was given to obtain multiple second-order label relationship groups to reclassify the multilabel domain， where the strong correlation between labels was enhanced and the weak correlation between labels was weakened to obtain the score of each feature. The Fisher score model was improved to preprocess the multilabel data. Thirdly， the multilabel classification margin was introduced to define the adaptive neighborhood radius and neighborhood class， and the upper and lower approximation sets were constructed. On this basis， the multilabel rough membership degree function was presented， and the multilabel neighborhood rough set was mapped to the fuzzy set. Based on the multilabel fuzzy neighborhood， the upper and lower approximation sets and the multilabel fuzzy neighborhood rough set model were developed. Thus， the fuzzy neighborhood entropy and the multilabel fuzzy neighborhood entropy were defined to effectively measure the uncertainty of the boundary region. Finally， the Multilabel Fisher Score-based feature selection algorithm with second-order Label Correlation （MFSLC） was designed， and then the MLFSF was constructed. The experimental results applied to 11 multilabel datasets with the Multi-Label K-Nearest Neighbor （MLKNN） classifier show that when compared with six state-of-the-art algorithms including the Multilabel Feature Selection algorithm based on improved ReliefF （MFSR）， MLFSF improves the mean of Average Precision （AP） by 2.47 to 6.66 percentage points； meanwhile， MLFSF obtains optimal values for all five evaluation metrics on most datasets.

Key words: multilabel learning, feature selection, Fisher score, multilabel fuzzy neighborhood rough set, fuzzy neighborhood entropy

中图分类号:

TP181

孙林, 马天娇, 薛占熬. 基于Fisher score与模糊邻域熵的多标记特征选择算法[J]. 计算机应用, 2023, 43(12): 3779-3789.

Lin SUN, Tianjiao MA, Zhan’ao XUE. Multilabel feature selection algorithm based on Fisher score and fuzzy neighborhood entropy[J]. Journal of Computer Applications, 2023, 43(12): 3779-3789.

图/表 7

表1 MLFSF与5种对比算法的异同点和计算复杂度

Tab.1 Similarities and differences and computational complexities between MLFSF and five comparative algorithms

算法	相同点	不同点	计算复杂度
PMU^［30］	过滤式方法，考虑了特征子集与标记集的相关性	1）运用多元互信息度量特征子集与标记集之间的相关性 2）基于高维联合熵的近似计算设计非转化多标记特征选择 3）需要提前给定特征数	O（nmz²）
MUCO^［31］	考虑了标记相关性	1）使用最大相关最小冗余进行标记相关性分析 2）使用模糊互信息考虑候选特征与所选特征之间的冗余 3）基于杰卡德距离度量计算样本之间的相似度 4）基于模糊互信息设计多标记特征选择	O（n²）
MDDM^［32］	考虑了特征与标记之间的相关性	1）使用希尔伯特-施密特独立性准则描述特征与关联标记之间的依赖关系，并度量特征与标记之间的相关性 2）将特征空间映射到低维空间设计多标记特征提取方法	O（m³）
MFSMR^［19］	过滤式方法，使用模糊邻域粗糙集理论，考虑了标记之间的相关性	1）使用模糊邻域相似度关系度量标记相关性 2）基于模糊邻域互信息评估特征之间的冗余性和特征与标记之间的相关性 3）基于模糊邻域粗糙集和最大相关性最小冗余度设计特征选择方法	O（m²（n+m）+z²（n+m）+ nmz+m²n log n）
MFSR^［33］	过滤式方法，考虑了标记之间的相关性	1）利用杰卡德距离度量标记之间的相关性 2）使用余弦相似度函数衡量特征的相似度 3）基于样本间相似度函数度量样本在整个样本空间的相似关系 4）基于ReliefF设计多标记特征选择方法	O（m（\|U\|+d）），其中d表示样本x_i 未拥有的标记数

表2 多标记数据集的详细信息

Tab.2 Details of multilabel datasets

序号	数据集	样本数	特征数	标记数	平均标记数	类型
1	Birds	645	260	20	1.47	Audio
2	Cal500	502	68	174	26.04	Music
3	Computer	5 000	681	33	1.51	Text
4	Emotion	593	72	6	1.87	Music
5	Enron	1 702	1 001	53	3.38	Text
6	Image	2 000	294	5	1.24	Image
7	Medical	978	1 449	45	1.25	Text
8	Recreation	5 000	606	22	1.42	Text
9	Reference	5 000	793	33	1.17	Text
10	Scene	2 407	294	6	1.07	Image
11	Yeast	2 417	103	14	4.24	Biology

图1 7种算法在11个多标记数据集上的AP（↑）指标比较

Fig. 1 Comparison of seven algorithms on eleven multilabel datasets in terms of AP（↑）

表3 7种算法在11个多标记数据集上的FN（↓）指标比较

Tab.3 Comparison of seven algorithms on eleven multilabel datasets in terms of FN（↓）

数据集	原始特征数	PMU	MUCO	MDDM-proj	MDDM-spc	MFSMR	MFSR	MLFSF
均值	534	260	181	333	351	111	105	89
Birds	260	260	214	247	191	260	181	169
Cal500	68	20	17	16	16	10	25	23
Computer	681	183	196	477	450	125	125	122
Emotion	550	72	61	22	31	40	40	35
Enron	72	70	71	150	261	100	125	121
Image	1 001	81	77	111	133	100	125	113
Medical	1 449	1 449	606	1 332	1 449	80	80	73
Recreation	606	152	180	443	491	200	150	50
Reference	793	196	180	488	473	150	150	150
Scene	294	293	291	290	278	100	100	78
Yeast	103	85	102	90	91	50	50	47

表4 8种算法在11个多标记数据集上的5个评价指标比较

Tab.4 Comparison of eight algorithms on eleven multilabel datasets in terms of five metrics

指标	数据集	MLKNN	PMU	MUCO	MDDM-proj	MDDM-spc	MFSMR	MFRS	MLFSF
AP（↑）	Birds	0.688 9	0.688 9	0.688 9	0.700 2	0.703 1	0.723 6	0.715 9	0.741 0
	Cal500	0.479 3	0.484 8	0.482 7	0.480 1	0.480 2	0.490 5	0.490 1	0.492 6
	Computer	0.639 0	0.644 7	0.658 0	0.643 2	0.646 1	0.643 8	0.638 5	0.642 2
	Emotion	0.757 8	0.757 8	0.771 5	0.772 2	0.778 8	0.771 6	0.785 9	0.799 8
	Enron	0.544 6	0.579 2	0.583 4	0.583 7	0.584 9	0.604 7	0.613 7	0.622 2
	Image	0.756 8	0.758 4	0.757 7	0.750 9	0.757 8	0.815 9	0.756 6	0.834 5
	Medical	0.770 0	0.770 0	0.791 4	0.778 7	0.770 0	0.423 6	0.423 5	0.851 0
	Recreation	0.469 9	0.480 7	0.508 8	0.492 4	0.488 8	0.464 7	0.478 7	0.520 0
	Reference	0.617 5	0.628 5	0.643 9	0.633 8	0.623 6	0.600 1	0.608 6	0.647 0
	Scene	0.851 4	0.852 1	0.852 5	0.852 2	0.853 2	0.770 2	0.770 2	0.856 3
	Yeast	0.754 1	0.757 4	0.755 8	0.756 9	0.758 3	0.733 8	0.752 2	0.759 4
	均值	0.666 3	0.673 0	0.681 3	0.676 8	0.676 8	0.640 2	0.639 4	0.706 0
HL（↓）	Birds	0.057 1	0.057 1	0.056 7	0.057 4	0.056 2	0.055 5	0.057 5	0.054 6
	Cal500	0.140 7	0.139 2	0.139 8	0.140 5	0.140 4	0.139 0	0.139 0	0.138 5
	Computer	0.040 2	0.038 6	0.038 0	0.039 5	0.039 7	0.038 3	0.038 7	0.037 8
	Emotion	0.228 1	0.228 1	0.230 1	0.234 0	0.219 8	0.218 3	0.210 5	0.196 8
	Enron	0.058 3	0.057 4	0.056 5	0.058 0	0.057 9	0.054 1	0.053 6	0.052 8
	Image	0.193 2	0.192 0	0.191 0	0.194 0	0.194 0	0.162 5	0.188 5	0.150 7
	Medical	0.017 3	0.017 3	0.017 2	0.016 7	0.017 3	0.027 5	0.028 1	0.012 6
	Recreation	0.061 2	0.061 1	0.059 5	0.060 8	0.060 3	0.063 6	0.063 2	0.058 7
	Reference	0.031 5	0.030 3	0.029 1	0.030 5	0.030 3	0.031 0	0.032 5	0.029 1
	Scene	0.097 0	0.096 8	0.102 1	0.095 9	0.092 4	0.127 6	0.130 2	0.092 4
	Yeast	0.199 6	0.202 1	0.200 0	0.199 4	0.199 6	0.208 8	0.201 7	0.199 7
	均值	0.102 2	0.101 8	0.101 8	0.102 4	0.100 7	0.102 4	0.104 0	0.093 1
RL（↓）	Birds	0.126 6	0.126 6	0.120 0	0.125 1	0.123 4	0.102 1	0.110 1	0.097 2
	Cal500	0.190 7	0.189 7	0.189 5	0.189 8	0.189 8	0.183 3	0.184 1	0.183 2
	Computer	0.089 1	0.087 7	0.084 8	0.087 9	0.089 6	0.087 1	0.091 2	0.090 4
	Emotion	0.195 5	0.195 5	0.230 1	0.183 7	0.187 1	0.192 9	0.179 1	0.165 5
	Enron	0.110 1	0.110 0	0.107 9	0.113 3	0.112 3	0.104 9	0.099 2	0.098 1
	Image	0.205 1	0.203 7	0.203 8	0.214 1	0.203 9	0.149 0	0.208 5	0.133 3
	Medical	0.061 4	0.061 4	0.060 7	0.057 8	0.061 4	0.132 7	0.137 5	0.037 3
	Recreation	0.187 9	0.180 1	0.173 5	0.180 0	0.181 9	0.189 3	0.180 3	0.170 5
	Reference	0.091 6	0.085 7	0.082 2	0.087 5	0.086 4	0.090 0	0.091 0	0.077 4
	Scene	0.087 6	0.088 0	0.091 4	0.086 8	0.087 9	0.148 2	0.142 3	0.083 6
	Yeast	0.174 8	0.174 9	0.174 5	0.175 0	0.172 4	0.191 0	0.178 0	0.172 4
	均值	0.138 2	0.136 3	0.138 0	0.136 5	0.136 0	0.142 8	0.145 6	0.119 0
OE（↓）	Birds	0.393 8	0.393 8	0.367 5	0.378 3	0.376 4	0.336 5	0.350 5	0.312 2
	Cal500	0.125 5	0.119 5	0.121 5	0.123 5	0.122 8	0.114 4	0.114 0	0.116 2
	Computer	0.434 3	0.429 7	0.413 8	0.429 7	0.428 7	0.439 1	0.436 8	0.433 4
	Emotion	0.362 8	0.362 8	0.342 8	0.340 0	0.302 9	0.309 5	0.286 5	0.272 1
	Enron	0.438 9	0.343 7	0.355 5	0.342 0	0.341 4	0.314 7	0.297 8	0.296 7
	Image	0.377 0	0.370 0	0.371 0	0.376 0	0.371 0	0.284 4	0.373 2	0.252 5
	Medical	0.295 5	0.295 5	0.274 0	0.275 1	0.295 5	0.700 0	0.690 5	0.183 0
	Recreation	0.684 9	0.673 7	0.634 5	0.653 0	0.659 8	0.693 2	0.668 2	0.614 0
	Reference	0.478 2	0.473 8	0.452 5	0.472 5	0.476 1	0.512 8	0.494 9	0.452 5
	Scene	0.248 8	0.245 5	0.243 5	0.243 4	0.243 8	0.372 1	0.373 8	0.239 3
	Yeast	0.244 5	0.229 6	0.241 8	0.235 2	0.235 0	0.254 0	0.235 9	0.231 6
	均值	0.371 3	0.358 0	0.347 1	0.351 7	0.350 3	0.393 7	0.392 9	0.309 5
CV（↓）	Birds	3.512 0	3.512 0	3.326 0	3.498 0	3.391 0	2.966 0	3.145 0	2.862 0
	Cal500	131.500 0	131.400 0	131.200 0	131.400 0	131.500 0	129.6000	130.570 0	129.600 0
	Computer	4.226 0	4.164 0	4.0890	4.223 0	4.252 0	4.121 0	4.345 0	4.272 0
	Emotion	1.944 0	1.944 0	1.901 0	1.900 0	1.919 0	1.970 0	1.867 0	1.788 0
	Enron	14.750 0	14.720 0	14.520 0	15.180 0	14.990 0	14.450 0	13.850 0	13.810 0
	Image	1.082 0	1.088 0	1.090 0	1.123 0	1.084 0	0.869 0	1.098 0	0.811 4
	Medical	3.634 0	3.634 0	3.615 0	3.425 0	3.634 0	6.846 0	7.160 0	2.4650
	Recreation	4.988 0	4.801 0	4.690 0	4.828 0	4.867 0	5.011 2	4.814 0	4.6680
	Reference	3.498 0	3.301 0	3.173 0	3.364 0	3.319 0	3.452 6	3.448 3	3.0560
	Scene	0.524 6	0.527 5	0.523 3	0.518 8	0.525 8	0.822 5	0.801 7	0.509 0
	Yeast	6.449 0	6.400 0	6.426 0	6.403 0	6.406 0	6.645 0	6.455 0	6.3630
	均值	16.010 0	15.954 0	15.868 0	15.988 0	15.990 0	16.065 0	16.139 0	15.4760

表4 8种算法在11个多标记数据集上的5个评价指标比较

Tab.4 Comparison of eight algorithms on eleven multilabel datasets in terms of five metrics

指标	数据集	MLKNN	PMU	MUCO	MDDM-proj	MDDM-spc	MFSMR	MFRS	MLFSF
AP（↑）	Birds	0.688 9	0.688 9	0.688 9	0.700 2	0.703 1	0.723 6	0.715 9	0.741 0
	Cal500	0.479 3	0.484 8	0.482 7	0.480 1	0.480 2	0.490 5	0.490 1	0.492 6
	Computer	0.639 0	0.644 7	0.658 0	0.643 2	0.646 1	0.643 8	0.638 5	0.642 2
	Emotion	0.757 8	0.757 8	0.771 5	0.772 2	0.778 8	0.771 6	0.785 9	0.799 8
	Enron	0.544 6	0.579 2	0.583 4	0.583 7	0.584 9	0.604 7	0.613 7	0.622 2
	Image	0.756 8	0.758 4	0.757 7	0.750 9	0.757 8	0.815 9	0.756 6	0.834 5
	Medical	0.770 0	0.770 0	0.791 4	0.778 7	0.770 0	0.423 6	0.423 5	0.851 0
	Recreation	0.469 9	0.480 7	0.508 8	0.492 4	0.488 8	0.464 7	0.478 7	0.520 0
	Reference	0.617 5	0.628 5	0.643 9	0.633 8	0.623 6	0.600 1	0.608 6	0.647 0
	Scene	0.851 4	0.852 1	0.852 5	0.852 2	0.853 2	0.770 2	0.770 2	0.856 3
	Yeast	0.754 1	0.757 4	0.755 8	0.756 9	0.758 3	0.733 8	0.752 2	0.759 4
	均值	0.666 3	0.673 0	0.681 3	0.676 8	0.676 8	0.640 2	0.639 4	0.706 0
HL（↓）	Birds	0.057 1	0.057 1	0.056 7	0.057 4	0.056 2	0.055 5	0.057 5	0.054 6
	Cal500	0.140 7	0.139 2	0.139 8	0.140 5	0.140 4	0.139 0	0.139 0	0.138 5
	Computer	0.040 2	0.038 6	0.038 0	0.039 5	0.039 7	0.038 3	0.038 7	0.037 8
	Emotion	0.228 1	0.228 1	0.230 1	0.234 0	0.219 8	0.218 3	0.210 5	0.196 8
	Enron	0.058 3	0.057 4	0.056 5	0.058 0	0.057 9	0.054 1	0.053 6	0.052 8
	Image	0.193 2	0.192 0	0.191 0	0.194 0	0.194 0	0.162 5	0.188 5	0.150 7
	Medical	0.017 3	0.017 3	0.017 2	0.016 7	0.017 3	0.027 5	0.028 1	0.012 6
	Recreation	0.061 2	0.061 1	0.059 5	0.060 8	0.060 3	0.063 6	0.063 2	0.058 7
	Reference	0.031 5	0.030 3	0.029 1	0.030 5	0.030 3	0.031 0	0.032 5	0.029 1
	Scene	0.097 0	0.096 8	0.102 1	0.095 9	0.092 4	0.127 6	0.130 2	0.092 4
	Yeast	0.199 6	0.202 1	0.200 0	0.199 4	0.199 6	0.208 8	0.201 7	0.199 7
	均值	0.102 2	0.101 8	0.101 8	0.102 4	0.100 7	0.102 4	0.104 0	0.093 1
RL（↓）	Birds	0.126 6	0.126 6	0.120 0	0.125 1	0.123 4	0.102 1	0.110 1	0.097 2
	Cal500	0.190 7	0.189 7	0.189 5	0.189 8	0.189 8	0.183 3	0.184 1	0.183 2
	Computer	0.089 1	0.087 7	0.084 8	0.087 9	0.089 6	0.087 1	0.091 2	0.090 4
	Emotion	0.195 5	0.195 5	0.230 1	0.183 7	0.187 1	0.192 9	0.179 1	0.165 5
	Enron	0.110 1	0.110 0	0.107 9	0.113 3	0.112 3	0.104 9	0.099 2	0.098 1
	Image	0.205 1	0.203 7	0.203 8	0.214 1	0.203 9	0.149 0	0.208 5	0.133 3
	Medical	0.061 4	0.061 4	0.060 7	0.057 8	0.061 4	0.132 7	0.137 5	0.037 3
	Recreation	0.187 9	0.180 1	0.173 5	0.180 0	0.181 9	0.189 3	0.180 3	0.170 5
	Reference	0.091 6	0.085 7	0.082 2	0.087 5	0.086 4	0.090 0	0.091 0	0.077 4
	Scene	0.087 6	0.088 0	0.091 4	0.086 8	0.087 9	0.148 2	0.142 3	0.083 6
	Yeast	0.174 8	0.174 9	0.174 5	0.175 0	0.172 4	0.191 0	0.178 0	0.172 4
	均值	0.138 2	0.136 3	0.138 0	0.136 5	0.136 0	0.142 8	0.145 6	0.119 0
OE（↓）	Birds	0.393 8	0.393 8	0.367 5	0.378 3	0.376 4	0.336 5	0.350 5	0.312 2
	Cal500	0.125 5	0.119 5	0.121 5	0.123 5	0.122 8	0.114 4	0.114 0	0.116 2
	Computer	0.434 3	0.429 7	0.413 8	0.429 7	0.428 7	0.439 1	0.436 8	0.433 4
	Emotion	0.362 8	0.362 8	0.342 8	0.340 0	0.302 9	0.309 5	0.286 5	0.272 1
	Enron	0.438 9	0.343 7	0.355 5	0.342 0	0.341 4	0.314 7	0.297 8	0.296 7
	Image	0.377 0	0.370 0	0.371 0	0.376 0	0.371 0	0.284 4	0.373 2	0.252 5
	Medical	0.295 5	0.295 5	0.274 0	0.275 1	0.295 5	0.700 0	0.690 5	0.183 0
	Recreation	0.684 9	0.673 7	0.634 5	0.653 0	0.659 8	0.693 2	0.668 2	0.614 0
	Reference	0.478 2	0.473 8	0.452 5	0.472 5	0.476 1	0.512 8	0.494 9	0.452 5
	Scene	0.248 8	0.245 5	0.243 5	0.243 4	0.243 8	0.372 1	0.373 8	0.239 3
	Yeast	0.244 5	0.229 6	0.241 8	0.235 2	0.235 0	0.254 0	0.235 9	0.231 6
	均值	0.371 3	0.358 0	0.347 1	0.351 7	0.350 3	0.393 7	0.392 9	0.309 5
CV（↓）	Birds	3.512 0	3.512 0	3.326 0	3.498 0	3.391 0	2.966 0	3.145 0	2.862 0
	Cal500	131.500 0	131.400 0	131.200 0	131.400 0	131.500 0	129.6000	130.570 0	129.600 0
	Computer	4.226 0	4.164 0	4.0890	4.223 0	4.252 0	4.121 0	4.345 0	4.272 0
	Emotion	1.944 0	1.944 0	1.901 0	1.900 0	1.919 0	1.970 0	1.867 0	1.788 0
	Enron	14.750 0	14.720 0	14.520 0	15.180 0	14.990 0	14.450 0	13.850 0	13.810 0
	Image	1.082 0	1.088 0	1.090 0	1.123 0	1.084 0	0.869 0	1.098 0	0.811 4
	Medical	3.634 0	3.634 0	3.615 0	3.425 0	3.634 0	6.846 0	7.160 0	2.4650
	Recreation	4.988 0	4.801 0	4.690 0	4.828 0	4.867 0	5.011 2	4.814 0	4.6680
	Reference	3.498 0	3.301 0	3.173 0	3.364 0	3.319 0	3.452 6	3.448 3	3.0560
	Scene	0.524 6	0.527 5	0.523 3	0.518 8	0.525 8	0.822 5	0.801 7	0.509 0
	Yeast	6.449 0	6.400 0	6.426 0	6.403 0	6.406 0	6.645 0	6.455 0	6.3630
	均值	16.010 0	15.954 0	15.868 0	15.988 0	15.990 0	16.065 0	16.139 0	15.4760

表5 7种算法的5个评价指标的统计结果

Tab.5 Statistical results of five metrics for seven algorithms

指标	$χ F 2$	F_F
AP	21.010 4	4.670 0
HL	23.045 6	5.365 1
RL	20.931 8	4.644 5
OE	15.624 8	2.545 8
CV	19.297 9	3.344 4

表5 7种算法的5个评价指标的统计结果

Tab.5 Statistical results of five metrics for seven algorithms

指标	$χ F 2$	F_F
AP	21.010 4	4.670 0
HL	23.045 6	5.365 1
RL	20.931 8	4.644 5
OE	15.624 8	2.545 8
CV	19.297 9	3.344 4

图2 7种算法在5个指标上的Nemenyi检验结果

Fig. 2 Nemenyi test results of seven algorithms in terms of five metrics

参考文献 36

1	SUN L， YIN T， DING W， et al. Multilabel feature selection using ML-ReliefF and neighborhood mutual information for multilabel neighborhood decision systems［J］. Information Sciences， 2020， 537： 401-424. 10.1016/j.ins.2020.05.102
2	张志浩，林耀进，卢舜，等. 缺失标记下基于类属属性的多标记特征选择［J］. 计算机应用， 2021， 41（10）： 2849-2857. 10.11772/j.issn.1001-9081.2020111893
	ZHANG Z H， LIN Y J， LU S， et al. Multi-label feature selection based on label-specific feature with missing labels［J］. Journal of Computer Applications， 2021， 41（10）： 2849-2857. 10.11772/j.issn.1001-9081.2020111893
3	孙林，黄苗苗，徐久成. 基于邻域粗糙集和Relief的弱标记特征选择方法［J］. 计算机科学， 2022， 49（4）： 152-160. 10.11896/jsjkx.210300094
	SUN L， HUANG M M， XU J C. Weak label feature selection method based on neighborhood rough sets and Relief［J］. Computer Science， 2022， 49（4）： 152-160. 10.11896/jsjkx.210300094
4	阮梓航，肖先勇，胡文曦，等. 基于多粒度特征选择和模型融合的复合电能质量扰动分类特征优化［J］. 电力系统保护与控制， 2022， 50（14）： 1-10.
	RUAN Z H， XIAO X Y， HU W X， et al. Multiple power quality disturbance classification feature optimization based on multi-granularity feature selection and model fusion ［J］. Power System Protection and Control， 2022， 50（14）： 1-10.
5	滕俊元，高猛，郑小萌，等. 噪声可容忍的软件缺陷预测特征选择方法［J］. 计算机科学， 2021， 48（12）： 131-139. 10.11896/jsjkx.201000168
	TENG J Y， GAO M， ZHENG X M， et al. Noise tolerable feature selection method for software defect prediction ［J］. Computer Science， 2021， 48（12）： 131-139. 10.11896/jsjkx.201000168
6	SUN L， WANG T， DING W， et al. Feature selection using Fisher score and multilabel neighborhood rough sets for multilabel classification ［J］. Information Sciences， 2021， 578： 887-912. 10.1016/j.ins.2021.08.032
7	汪正凯，沈东升，王晨曦. 基于文本分类的Fisher Score快速多标记特征选择算法［J］. 计算机工程， 2022， 48（2）： 113-124. 10.19678/j.issn.1000-3428.0060594
	WANG Z K， SHEN D S， WANG C X. Fisher Score fast multi-label feature selection algorithm based on text classification［J］. Computer Engineering， 2022， 48（2）： 113-124. 10.19678/j.issn.1000-3428.0060594
8	GUYON I， WESTON J， BARNHILL S， et al. Gene selection for cancer classification using support vector machines［J］. Machine Learning， 2002， 46： 389-422. 10.1023/a:1012487302797
9	GÜNES S， POLAT K， YOSUNKAYA S. Multi-class f-score feature selection approach to classification of obstructive sleep apnea syndrome［J］. Expert Systems with Applications， 2010， 37（2）： 998-1004. 10.1016/j.eswa.2009.05.075
10	孙林，黄金旭，徐久成. 基于邻域容差互信息和鲸鱼优化算法的非平衡数据特征选择［J］.计算机应用， 2023， 43（6）： 1842-1854. 10.11772/j.issn.1001-9081.2022050691
	SUN L， HUANG J X， XU J C. Feature selection for imbalanced data based on neighborhood tolerance mutual information and whale optimization ［J］. Journal of Computer Applications， 2023， 43（6）： 1842-1854. 10.11772/j.issn.1001-9081.2022050691
11	HANCER E， XUE B， ZHANG M. Differential evolution for filter feature selection based on information theory and feature ranking［J］. Knowledge-Based Systems， 2018， 140： 103-119. 10.1016/j.knosys.2017.10.028
12	吴迪，郭嗣琮. 改进的Fisher Score特征选择方法及其应用［J］. 辽宁工程技术大学学报（自然科学版）， 2019， 38（5）： 472-479.
	WU D， GUO S Z. An improved Fisher Score feature selection method and its application［J］. Journal of Liaoning Technical University （Natural Science）， 2019， 38（5）： 472-479.
13	段洁，胡清华，张灵均，等. 基于邻域粗糙集的多标记分类特征选择算法［J］. 计算机研究与发展， 2015， 52（1）： 56-65. 10.7544/issn1000-1239.2015.20140544
	DUAN J， HU Q H， ZHANG L J， et al. Feature selection for multi-label classification based on neighborhood rough sets［J］. Journal of Computer Research and Development， 2015， 52（1）： 56-65. 10.7544/issn1000-1239.2015.20140544
14	LIN Y， HU Q， LIU J， et al. Multi-label feature selection based on neighborhood mutual information［J］. Applied Software Computation， 2016， 38： 244-256. 10.1016/j.asoc.2015.10.009
15	LIU J， LIN Y， LI Y， et al. Online multi-label streaming feature selection based on neighborhood rough set［J］. Pattern Recognition， 2018， 84： 273-287. 10.1016/j.patcog.2018.07.021
16	HUANG M， SUN L， XU J， et al. Multilabel feature selection using Relief and minimum redundancy maximum relevance based on neighborhood rough sets ［J］. IEEE Access， 2020， 8： 62011-62031. 10.1109/access.2020.2982536
17	WU Y， LIU J， YU X， et al. Neighborhood rough set based multi-label feature selection with label correlation［J］. Concurrency and Computation： Practice and Experience， 2022， 34（22）： e7162. 10.1002/cpe.7162
18	CHEN P， LIN M， LIU J. Multi-label attribute reduction based on variable precision fuzzy neighborhood rough set ［J］. IEEE Access， 2020， 8： 133565-133576. 10.1109/access.2020.3010314
19	SUN L， YIN T， DING W， et al. Feature selection with missing labels using multilabel fuzzy neighborhood rough sets and maximum relevance minimum redundancy ［J］. IEEE Transactions on Fuzzy Systems， 2022， 30（5）： 1197-1211. 10.1109/tfuzz.2021.3053844
20	XU J， SHEN K， SUN L. Multi-label feature selection based on fuzzy neighborhood rough sets ［J］. Complex & Intelligent Systems， 2022， 8： 2105-2129. 10.1007/s40747-021-00636-y
21	张大斌，张博婷，凌立文，等.基于二次分解聚合策略的我国碳交易价格预测［J］.系统科学与数学， 2022， 42（11）： 3094-3106. 10.12341/jssms22287
	ZHANG D B， ZHANG B T， LING L W， et al. Carbon price forecasting based on secondary decomposition and aggregation strategy［J］. Journal of Systems Science and Mathematical Sciences， 2022， 42（11）： 3094-3106. 10.12341/jssms22287
22	RESHEF D N， RESHEF Y A， FINUCANE H K， et al. Detecting novel associations in large data sets［J］. Science， 2011， 334（6062）： 1518-1524. 10.1126/science.1205438
23	刘琨，封硕. 加强局部搜索能力的人工蜂群算法［J］. 河南师范大学学报（自然科学版）， 2021， 49（2）： 15-24. 10.16366/j.cnki.1000-2367.2021.02.003
	LIU K， FENG S. An improved artificial bee colony algorithm for enhancing local search ability［J］. Journal of Henan Normal University （Natural Science Edition）， 2021， 49（2）： 15-24. 10.16366/j.cnki.1000-2367.2021.02.003
24	SARWAR B， KARYPIS G， KONSTAN J， et al. Item-based collaborative filtering recommendation algorithms［C］// Proceedings of the 10th International Conference on World Wide Web. New York： ACM， 2001： 285-295. 10.1145/371920.372071
25	黄剑湘，林铮，刘可真，等. 考虑换流站海量事件的关联规则挖掘分析方法［J］. 电力系统保护与控制， 2022， 50（12）： 117-125.
	HUANG J X， LIN Z， LIU K Z， et al. Association rule mining analysis method considering massive events in a converter station ［J］. Power System Protection and Control， 2022， 50（12）： 117-125.
26	余鹰. 多标记学习研究综述［J］. 计算机工程与应用， 2015， 51（17）： 20-27. 10.3778/j.issn.1002-8331.1506-0063
	YU Y. Survey on multi-label learning［J］. Computer Engineering and Applications， 2015， 51（17）： 20-27. 10.3778/j.issn.1002-8331.1506-0063
27	ZHENG T， ZHU L. Uncertainty measures of neighborhood system based rough sets［J］. Knowledge Based Systems， 2015， 86： 57-65. 10.1016/j.knosys.2015.05.021
28	刘艳，程璐，孙林. 基于K-S检验和邻域粗糙集的特征选择方法［J］. 河南师范大学学报（自然科学版）， 2019， 47（2）： 21-28. 10.16366/j.cnki.1000-2367.2019.02.004
	LIU Y， CHENG L， SUN L. Feature selection method based on K-S test and neighborhood rough sets［J］. Journal of Henan Normal University （Natural Science Edition）， 2019， 47（2）： 21-28. 10.16366/j.cnki.1000-2367.2019.02.004
29	姚晟，徐风，赵鹏，等. 基于改进邻域粒的模糊熵特征选择算法［J］.南京大学学报（自然科学）， 2017， 53（4）： 802-814. 10.13232/j.cnki.jnju.2017.04.024
	YAO S， XU F， ZHAO P， et al. Fuzzy entropy feature selection algorithm based on improved neighborhood granule ［J］. Journal of Nanjing University （Natural Science）， 2017， 53（4）： 802-814. 10.13232/j.cnki.jnju.2017.04.024
30	LEE J， KIM D-W. Feature selection for multi-label classification using multivariate mutual information［J］. Pattern Recognition Letters， 2013， 34（3）： 349-357. 10.1016/j.patrec.2012.10.005
31	LIN Y， HU Q， LIU J， et al. Streaming feature selection for multilabel learning based on fuzzy mutual information［J］. IEEE Transactions on Fuzzy Systems， 2017， 25（6）： 1491-1507. 10.1109/tfuzz.2017.2735947
32	ZHANG Y， ZHOU Z-H. Multilabel dimensionality reduction via dependence maximization［J］. ACM Transactions on Knowledge Discovery from Data， 2010， 4（3）： Article No. 14. 10.1145/1839490.1839495
33	孙林，陈雨生，徐久成. 基于改进ReliefF的多标记特征选择算法［J］. 山东大学学报（理学版）， 2022， 57（4）： 1-11.
	SUN L， CHEN Y S， XU J C. Multilabel feature selection algorithm based on improved ReliefF ［J］. Journal of Shandong University （Natural Science）， 2022， 57（4）：1-11.
34	SCHAPIRE R E， SINGER Y. BoosTexter： a boosting-based system for text categorization［J］. Machine Learning， 2000， 39： 135-168. 10.1023/a:1007649029923
35	TSOUMAKAS G， VLAHAVAS I. Random k-labelsets： an ensemble method for multilabel classification ［C］// Proceedings of the 2007 European Conference on Machine Learning. Berlin： Springer， 2007： 406-417.
36	CHEN L， CHEN D. Alignment based feature selection for multi-label learning［J］. Neural Processing Letters， 2019， 50： 2323-2344. 10.1007/s11063-019-10009-9

[1]	何添, 沈宗鑫, 黄倩倩, 黄雁勇. 基于自适应学习的多视图无监督特征选择方法[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2657-2664.
[2]	孙林, 黄金旭, 徐久成. 基于邻域容差互信息和鲸鱼优化算法的非平衡数据特征选择[J]. 《计算机应用》唯一官方网站, 2023, 43(6): 1842-1854.
[3]	于振华, 刘争气, 刘颖, 郭城. 基于自适应混合粒子群优化的软件缺陷预测特征选择方法[J]. 《计算机应用》唯一官方网站, 2023, 43(4): 1206-1213.
[4]	徐精诚, 陈学斌, 董燕灵, 杨佳. 融合特征选择的随机森林DDoS攻击检测[J]. 《计算机应用》唯一官方网站, 2023, 43(11): 3497-3503.
[5]	马磊, 罗川, 李天瑞, 陈红梅. 基于模糊粗糙集的无监督动态特征选择算法[J]. 《计算机应用》唯一官方网站, 2023, 43(10): 3121-3128.
[6]	陈亮, 汤显峰. 改进正余弦算法优化特征选择及数据分类[J]. 《计算机应用》唯一官方网站, 2022, 42(6): 1852-1861.
[7]	赵静, 韩京宇, 钱龙, 毛毅. 基于改进的RAKEL算法的心电图诊断分类[J]. 《计算机应用》唯一官方网站, 2022, 42(6): 1892-1897.
[8]	李莉, 石可欣, 任振康. 基于特征选择和TrAdaBoost的跨项目缺陷预测方法[J]. 《计算机应用》唯一官方网站, 2022, 42(5): 1554-1562.
[9]	孙林, 赵婧, 徐久成, 王欣雅. 基于邻域粗糙集和帝王蝶优化的特征选择算法[J]. 《计算机应用》唯一官方网站, 2022, 42(5): 1355-1366.
[10]	李晓寒, 贾华丁, 程雪, 李太勇. 基于改进遗传算法和图神经网络的股市波动预测方法[J]. 《计算机应用》唯一官方网站, 2022, 42(5): 1624-1633.
[11]	张小清, 王晨曦, 吕彦, 林耀进. 基于ReliefF的层次分类在线流特征选择算法[J]. 《计算机应用》唯一官方网站, 2022, 42(3): 688-694.
[12]	轩书婷, 刘惊雷. 基于离散哈希的聚类[J]. 《计算机应用》唯一官方网站, 2022, 42(3): 713-723.
[13]	李懿恒, 杜晨曦, 杨燕燕, 李翔宇. 基于伪标签一致度的不平衡数据特征选择算法[J]. 《计算机应用》唯一官方网站, 2022, 42(2): 475-484.
[14]	葛倩, 张光斌, 张小凤. 基于最大信息系数的ReliefF和支持向量机交互的自动特征选择算法[J]. 《计算机应用》唯一官方网站, 2022, 42(10): 3046-3053.
[15]	陈永波, 李巧勤, 刘勇国. 基于动态相关性的特征选择算法[J]. 《计算机应用》唯一官方网站, 2022, 42(1): 109-114.

基于Fisher score与模糊邻域熵的多标记特征选择算法

Multilabel feature selection algorithm based on Fisher score and fuzzy neighborhood entropy

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 7

参考文献 36

相关文章 15

编辑推荐

Metrics