Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (5): 1467-1472.DOI: 10.11772/j.issn.1001-9081.2022081154
Special Issue: 数据科学与技术
• Data science and technology • Previous Articles Next Articles
Yuanjiang LI, Jinsheng QUAN, Yangyi TAN, Tian YANG()
Received:
2022-07-19
Revised:
2022-09-06
Accepted:
2022-10-12
Online:
2023-05-08
Published:
2023-05-10
Contact:
Tian YANG
About author:
LI Yuanjiang, born in 1999, M. S. candidate. His research interests include data mining, rough set theory, machine learning.Supported by:
通讯作者:
杨田
作者简介:
李元江(1999—),男,湖北宜昌人,硕士研究生,主要研究方向:数据挖掘、粗糙集理论、机器学习基金资助:
CLC Number:
Yuanjiang LI, Jinsheng QUAN, Yangyi TAN, Tian YANG. Attribute reduction for high-dimensional data based on bi-view of similarity and difference[J]. Journal of Computer Applications, 2023, 43(5): 1467-1472.
李元江, 权金升, 谭阳奕, 杨田. 基于相似和差异双视角的高维数据属性约简[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1467-1472.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2022081154
数据集 | 样本数 | 特征数 | 类别数 |
---|---|---|---|
Sonar | 208 | 60 | 2 |
SCADI | 70 | 205 | 7 |
Heart | 270 | 13 | 2 |
Allaml | 72 | 129 | 2 |
Lung | 203 | 3 312 | 2 |
GLI85 | 85 | 22 283 | 2 |
Biodeg | 1 055 | 41 | 2 |
ORL | 400 | 1 024 | 40 |
Pageblock | 5 472 | 10 | 5 |
Messidor | 1 151 | 19 | 2 |
Cane | 1 080 | 856 | 9 |
Tab. 1 Dataset information
数据集 | 样本数 | 特征数 | 类别数 |
---|---|---|---|
Sonar | 208 | 60 | 2 |
SCADI | 70 | 205 | 7 |
Heart | 270 | 13 | 2 |
Allaml | 72 | 129 | 2 |
Lung | 203 | 3 312 | 2 |
GLI85 | 85 | 22 283 | 2 |
Biodeg | 1 055 | 41 | 2 |
ORL | 400 | 1 024 | 40 |
Pageblock | 5 472 | 10 | 5 |
Messidor | 1 151 | 19 | 2 |
Cane | 1 080 | 856 | 9 |
算法 | 时间复杂度 | 空间复杂度 |
---|---|---|
DMG | ||
FFRS | ||
GBNRS | ||
ARSDM |
Tab. 2 Comparison of time/space complexity
算法 | 时间复杂度 | 空间复杂度 |
---|---|---|
DMG | ||
FFRS | ||
GBNRS | ||
ARSDM |
数据集 | RAW | DMG | FFRS | GBNRS | ARSDM |
---|---|---|---|---|---|
Average | 79.33 | 82.97 | 77.56 | 75.12 | 84.04 |
Sonar | 73.45 | 75.00 | 76.00 | 75.52 | 75.5 |
SCADI | 77.08 | 90.00 | 92.50 | 76.37 | 97.50 |
Heart | 75.19 | 81.48 | 81.85 | 79.63 | 80.00 |
Allaml | 85.36 | 98.33 | 93.33 | 75.42 | 100.00 |
Lung | 90.27 | 88.33 | 94.44 | 79.45 | 95.00 |
GLI85 | 81.67 | 94.29 | 90.00 | 67.78 | 90.00 |
Biodeg | 81.30 | 81.11 | 76.06 | 81.32 | 83.37 |
ORL | 60.54 | 54.64 | 35.50 | 41.25 | 53.25 |
Pageblock | 99.31 | 99.42 | 99.47 | 99.32 | 99.48 |
Messidor | 61.84 | 62.37 | 64.43 | 63.51 | 62.43 |
Cane | 86.67 | 87.69 | 49.63 | 86.76 | 87.87 |
Tab. 3 Comparison of classification accuracy on reduction data under CART classifier
数据集 | RAW | DMG | FFRS | GBNRS | ARSDM |
---|---|---|---|---|---|
Average | 79.33 | 82.97 | 77.56 | 75.12 | 84.04 |
Sonar | 73.45 | 75.00 | 76.00 | 75.52 | 75.5 |
SCADI | 77.08 | 90.00 | 92.50 | 76.37 | 97.50 |
Heart | 75.19 | 81.48 | 81.85 | 79.63 | 80.00 |
Allaml | 85.36 | 98.33 | 93.33 | 75.42 | 100.00 |
Lung | 90.27 | 88.33 | 94.44 | 79.45 | 95.00 |
GLI85 | 81.67 | 94.29 | 90.00 | 67.78 | 90.00 |
Biodeg | 81.30 | 81.11 | 76.06 | 81.32 | 83.37 |
ORL | 60.54 | 54.64 | 35.50 | 41.25 | 53.25 |
Pageblock | 99.31 | 99.42 | 99.47 | 99.32 | 99.48 |
Messidor | 61.84 | 62.37 | 64.43 | 63.51 | 62.43 |
Cane | 86.67 | 87.69 | 49.63 | 86.76 | 87.87 |
数据集 | RAW | DMG | FFRS | GBNRS | ARSDM |
---|---|---|---|---|---|
Average | 70.34 | 89.81 | 79.81 | 79.38 | 91.77 |
Sonar | 87.49 | 89.50 | 82.50 | 80.71 | 92.50 |
SCADI | 41.73 | 95.00 | 90.00 | 80.96 | 100.00 |
Heart | 79.26 | 82.22 | 82.96 | 79.63 | 84.07 |
Allaml | 65.42 | 95.00 | 100.00 | 84.52 | 98.33 |
Lung | 68.57 | 92.22 | 97.22 | 80.43 | 92.78 |
GLI85 | 69.64 | 94.29 | 94.29 | 60.97 | 95.71 |
Biodeg | 88.36 | 88.58 | 80.96 | 86.27 | 89.81 |
ORL | 30.50 | 91.01 | 35.70 | 63.00 | 93.50 |
Pageblock | 97.58 | 97.65 | 98.00 | 97.59 | 97.74 |
Messidor | 73.84 | 73.85 | 66.69 | 74.89 | 75.65 |
Cane | 71.39 | 88.61 | 49.54 | 84.26 | 89.35 |
Tab. 4 Comparison of classification accuracy on reduction data under SVM classifier
数据集 | RAW | DMG | FFRS | GBNRS | ARSDM |
---|---|---|---|---|---|
Average | 70.34 | 89.81 | 79.81 | 79.38 | 91.77 |
Sonar | 87.49 | 89.50 | 82.50 | 80.71 | 92.50 |
SCADI | 41.73 | 95.00 | 90.00 | 80.96 | 100.00 |
Heart | 79.26 | 82.22 | 82.96 | 79.63 | 84.07 |
Allaml | 65.42 | 95.00 | 100.00 | 84.52 | 98.33 |
Lung | 68.57 | 92.22 | 97.22 | 80.43 | 92.78 |
GLI85 | 69.64 | 94.29 | 94.29 | 60.97 | 95.71 |
Biodeg | 88.36 | 88.58 | 80.96 | 86.27 | 89.81 |
ORL | 30.50 | 91.01 | 35.70 | 63.00 | 93.50 |
Pageblock | 97.58 | 97.65 | 98.00 | 97.59 | 97.74 |
Messidor | 73.84 | 73.85 | 66.69 | 74.89 | 75.65 |
Cane | 71.39 | 88.61 | 49.54 | 84.26 | 89.35 |
数据集 | DMG | FFRS | GBNRS | ARSDM |
---|---|---|---|---|
Average | 107.94 | 2 016.07 | 7 246.94 | 156.09 |
Sonar | 0.85 | 21.02 | 57.55 | 3.16 |
SCADI | 0.26 | 33.82 | 86.37 | 0.41 |
Heart | 0.40 | 7.82 | 6.88 | 0.88 |
Allaml | 6.15 | 702.26 | 5 843.18 | 13.18 |
Lung | 25.78 | 1 829.63 | 4 517.38 | 52.91 |
GLI85 | 41.23 | 4 410.49 | 58 040.21 | 98.00 |
Biodeg | 17.57 | 395.04 | 134.85 | 41.87 |
ORL | 96.52 | 1 330.05 | 3 326.97 | 116.74 |
Pageblock | 72.24 | 2 405.74 | 18.35 | 273.76 |
Messidor | 10.76 | 205.45 | 103.27 | 22.68 |
Cane | 915.60 | 10 835.40 | 7 581.36 | 1 093.38 |
Table 5 Comparison of reduction time of four algorithms
数据集 | DMG | FFRS | GBNRS | ARSDM |
---|---|---|---|---|
Average | 107.94 | 2 016.07 | 7 246.94 | 156.09 |
Sonar | 0.85 | 21.02 | 57.55 | 3.16 |
SCADI | 0.26 | 33.82 | 86.37 | 0.41 |
Heart | 0.40 | 7.82 | 6.88 | 0.88 |
Allaml | 6.15 | 702.26 | 5 843.18 | 13.18 |
Lung | 25.78 | 1 829.63 | 4 517.38 | 52.91 |
GLI85 | 41.23 | 4 410.49 | 58 040.21 | 98.00 |
Biodeg | 17.57 | 395.04 | 134.85 | 41.87 |
ORL | 96.52 | 1 330.05 | 3 326.97 | 116.74 |
Pageblock | 72.24 | 2 405.74 | 18.35 | 273.76 |
Messidor | 10.76 | 205.45 | 103.27 | 22.68 |
Cane | 915.60 | 10 835.40 | 7 581.36 | 1 093.38 |
1 | 周涛,陆惠玲,任海玲,等. 基于粗糙集的属性约简算法综述[J]. 电子学报, 2021, 49(7): 1439-1449. 10.12263/DZXB.20200330 |
ZHOU T, LU H L, REN H L, et al. Survey on attribute reduction algorithm of rough set[J]. Acta Electronica Sinica, 2021, 49(7): 1439-1449. 10.12263/DZXB.20200330 | |
2 | HEDAR A R, WANG J, FUKUSHIMA M. TABU search for attribute reduction in rough set theory[J]. Soft Computing, 2008, 12(9): 909-918. 10.1007/s00500-007-0260-1 |
3 | PAWLAK Z. Rough sets[J]. International Journal of Computer and Information Sciences, 1982, 11(5): 341-356. 10.1007/bf01001956 |
4 | 汤建国,祝峰,佘堃,等. 粗糙集与其他软计算理论结合情况研究综述[J]. 计算机应用研究, 2010, 27(7):2404-2410. 10.3969/j.issn.1001-3695.2010.07.002 |
TANG J G, ZHU W, SHE K, et al. Survey on combination of rough sets and other soft computing theories[J]. Application Research of Computers, 2010, 27(7):2404-2410. 10.3969/j.issn.1001-3695.2010.07.002 | |
5 | HU Q H, ZHANG L J, ZHOU Y C, et al. Large-scale multimodality attribute reduction with multi-kernel fuzzy rough sets[J]. IEEE Transactions on Fuzzy Systems, 2018, 26(1): 226-238. 10.1109/tfuzz.2017.2647966 |
6 | YANG T, ZHONG X R, LANG G M, et al. Granular matrix: a new approach for granular structure reduction and redundancy evaluation[J]. IEEE Transactions on Fuzzy Systems, 2020, 28(12): 3133-3144. 10.1109/tfuzz.2020.2984198 |
7 | DAI J H, HU H, WU W Z, et al. Maximal-discernibility-pair-based approach to attribute reduction in fuzzy rough sets[J]. IEEE Transactions on Fuzzy Systems, 2018, 26(4): 2174-2187. 10.1109/tfuzz.2017.2768044 |
8 | DAI J H, HU Q H, HU H, et al. Neighbor inconsistent pair selection for attribute reduction by rough set approach[J]. IEEE Transactions on Fuzzy Systems, 2018, 26(2): 937-950. 10.1109/tfuzz.2017.2698420 |
9 | YANG Y Y, CHEN D G, WANG H, et al. Incremental perspective for feature selection based on fuzzy rough sets[J]. IEEE Transactions on Fuzzy Systems, 2018, 26(3): 1257-1273. 10.1109/tfuzz.2017.2718492 |
10 | LANG G M, LI Q G, YANG T. An incremental approach to attribute reduction of dynamic set-valued information systems[J]. International Journal of Machine Learning and Cybernetics, 2014, 5(5): 775-788. 10.1007/s13042-013-0225-x |
11 | LIANG J Y, WANG F, DANG C Y, et al. A group incremental approach to feature selection applying rough set technique[J]. IEEE Transactions on Knowledge and Data Engineering, 2014, 26(2): 294-308. 10.1109/tkde.2012.146 |
12 | SKOWRON A, RAUSZER C. The discernibility matrices and functions in information systems[M]// SŁOWIŃSKI R. Intelligent Decision Support: Handbook of Applications and Advances of the Rough Sets Theory, TDLD 11. Dordrecht: Springer, 1992: 331-362. 10.1007/978-94-015-7975-9_21 |
13 | CHEN D G, ZHAO S Y, ZHANG L, et al. Sample pair selection for attribute reduction with rough set[J]. IEEE Transactions on Knowledge and Data Engineering, 2012, 24(11): 2080-2093. 10.1109/tkde.2011.89 |
14 | HU Q H, YU D R, LIU J F, et al. Neighborhood rough set based heterogeneous feature subset selection[J]. Information Sciences, 2008, 178(18): 3577-3594. 10.1016/j.ins.2008.05.024 |
15 | 徐波,张贤勇,冯山. 邻域粗糙集的加权依赖度及其启发式约简算法[J]. 模式识别与人工智能, 2018, 31(3): 256-264. 10.16451/j.cnki.issn1003-6059.201803007 |
XU B, ZHANG X Y, FENG S. Weighted dependence of neighborhood rough sets and its heuristic reduction algorithm[J]. Pattern Recognition and Artificial Intelligence, 2018, 31(3): 256-264. 10.16451/j.cnki.issn1003-6059.201803007 | |
16 | QIAN Y H, LIANG J Y, PEDRYCZ W, et al. Positive approximation: an accelerator for attribute reduction in rough set theory[J]. Artificial Intelligence, 2010, 174(9/10): 597-618. 10.1016/j.artint.2010.04.018 |
17 | 曾维佳,秦放,李琳,等. 基于信息熵的粗糙集属性应急数据去重挖掘算法研究[J]. 计算技术与自动化, 2021, 40(4):64-68. 10.16339/j.cnki.jsjsyzdh.202104012 |
ZENG W J, QIN F, LI L, et al. Research on algorithm of deduplication mining for rough set attribute emergency data based on information entropy[J]. Computing Technology and Automation, 2021, 40(4):64-68. 10.16339/j.cnki.jsjsyzdh.202104012 | |
18 | YANG T, LI Q G, ZHOU B L. Related family: a new method for attribute reduction of covering information systems[J]. Information Sciences, 2013, 228: 175-191. 10.1016/j.ins.2012.11.005 |
19 | CHEN J K, MI J S, LIN Y J. A graph approach for fuzzy-rough feature selection[J]. Fuzzy Sets and Systems, 2020, 391: 96-116. 10.1016/j.fss.2019.07.014 |
20 | AGGARWAL M. Rough information set and its applications in decision making[J]. IEEE Transactions on Fuzzy Systems, 2017, 25(2): 265-276. 10.1109/tfuzz.2017.2670551 |
21 | AN S, HU Q H, PEDRYCZ W, et al. Data-distribution-aware fuzzy rough set model and its application to robust classification[J]. IEEE Transactions on Cybernetics, 2016, 46(12): 3073-3085. |
22 | TAO H, HOU C P, NIE F P, et al. Effective discriminative feature selection with nontrivial solution[J]. IEEE Transactions on Neural Networks and Learning Systems, 2016, 27(4): 796-808. 10.1109/tnnls.2015.2424721 |
23 | WANG C Z, HE Q, CHEN D G, et al. A novel method for attribute reduction of covering decision systems[J]. Information Sciences, 2014, 254: 181-196. 10.1016/j.ins.2013.08.057 |
24 | ARMANFARD N, REILLY J P, KOMEILI M. Local feature selection for data classification[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(6): 1217-1227. 10.1109/tpami.2015.2478471 |
25 | WANG C Z, HUANG Y, SHAO M W, et al. Feature selection based on neighborhood self-information[J]. IEEE Transactions on Cybernetics, 2020, 50(9): 4031-4042. 10.1109/tcyb.2019.2923430 |
26 | ZHU P F, HU Q H. Adaptive neighborhood granularity selection and combination based on margin distribution optimization[J]. Information Sciences, 2013, 249: 1-12. 10.1016/j.ins.2013.06.012 |
27 | YAMADA M, TANG J L, LUGO-MARTINEZ J, et al. Ultra high-dimensional nonlinear feature selection for big biological data[J]. IEEE Transactions on Knowledge and Data Engineering, 2018, 30(7): 1352-1365. 10.1109/tkde.2018.2789451 |
28 | HU M, TSANG E C C, GUO Y T, et al. Attribute reduction based on overlap degree and k-nearest-neighbor rough sets in decision information systems[J]. Information Sciences, 2022, 584: 301-324. 10.1016/j.ins.2021.10.063 |
29 | WANG C Z, HU Q H, WANG X Z, et al. Feature selection based on neighborhood discrimination index[J]. IEEE Transactions on Neural Networks and Learning Systems, 2018, 29(7): 2986-2999. 10.1109/tnnls.2018.2830700 |
30 | DUBOIS D, PRADE H. Rough fuzzy sets and fuzzy rough sets[J]. International Journal of General Systems, 1990, 17(2/3): 191-209. 10.1080/03081079008935107 |
31 | JENSEN R, SHEN Q. Fuzzy-rough attribute reduction with application to web categorization[J]. Fuzzy Sets and Systems, 2004, 141(3): 469-485. 10.1016/s0165-0114(03)00021-6 |
32 | CHEN D G, ZHANG L, ZHAO S Y, et al. A novel algorithm for finding reducts with fuzzy rough sets[J]. IEEE Transactions on Fuzzy Systems, 2012, 20(2): 385-389. 10.1109/tfuzz.2011.2173695 |
33 | YANG M, CHEN S C, YANG X B. A novel approach of rough set-based attribute reduction using fuzzy discernibility matrix[C]// Proceedings of the 4th International Conference on Fuzzy Systems and Knowledge Discovery. Piscataway: IEEE, 2007:96-101. 10.1109/fskd.2007.97 |
34 | WANG C Z, QI Y L, SHAO M W, et al. A fitting model for feature selection with fuzzy rough sets[J]. IEEE Transactions on Fuzzy Systems, 2017, 25(4): 741-753. 10.1109/tfuzz.2016.2574918 |
35 | XIA S Y, ZHANG H, LI W H, et al. GBNRS: a novel rough set algorithm for fast adaptive attribute reduction in classification[J]. IEEE Transactions on Knowledge and Data Engineering, 2022, 34(3): 1231-1242. 10.1109/tkde.2020.2997039 |
36 | CHEN J K, LIN Y J, LIN G P, et al. Attribute reduction of covering decision systems by hypergraph model[J]. Knowledge-Based Systems, 2017, 118: 93-104. 10.1016/j.knosys.2016.11.010 |
[1] | Huanhuan LI, Tianqiang HUANG, Xuemei DING, Haifeng LUO, Liqing HUANG. Public traffic demand prediction based on multi-scale spatial-temporal graph convolutional network [J]. Journal of Computer Applications, 2024, 44(7): 2065-2072. |
[2] | Yao DONG, Yixue FU, Yongfeng DONG, Jin SHI, Chen CHEN. Survey of incomplete multi-view clustering [J]. Journal of Computer Applications, 2024, 44(6): 1673-1682. |
[3] | Keshuai YANG, Youxi WU, Meng GENG, Jingyu LIU, Yan LI. Top-k high average utility sequential pattern mining algorithm under one-off condition [J]. Journal of Computer Applications, 2024, 44(2): 477-484. |
[4] | Yuhao TANG, Dezhong PENG, Zhong YUAN. Fuzzy multi-granularity anomaly detection for incomplete mixed data [J]. Journal of Computer Applications, 2024, 44(10): 3097-3104. |
[5] | Haodong ZHENG, Hua MA, Yingchao XIE, Wensheng TANG. Knowledge tracing model based on graph neural network blending with forgetting factors and memory gate [J]. Journal of Computer Applications, 2023, 43(9): 2747-2752. |
[6] | Shuo HUANG, Yanhui LI, Jianqiu CAO. PrivSPM: frequent sequential pattern mining algorithm under local differential privacy [J]. Journal of Computer Applications, 2023, 43(7): 2057-2064. |
[7] | Hua JIANG, Xing LI, Huijiao WANG, Jinghai WEI. Cross-level high utility itemsets mining algorithm based on data index structure [J]. Journal of Computer Applications, 2023, 43(7): 2200-2208. |
[8] | Chaoshuai QI, Wensi HE, Yi JIAO, Yinghong MA, Wei CAI, Suping REN. Survey on anomaly detection algorithms for unmanned aerial vehicle flight data [J]. Journal of Computer Applications, 2023, 43(6): 1833-1841. |
[9] | Qing WANG, Xiuwei GAO, Yehai XIE, Guilong LIU. Inner product reduction in formal context [J]. Journal of Computer Applications, 2023, 43(4): 1079-1085. |
[10] | Xiaomeng SHAO, Meng ZHANG. Temporal convolutional knowledge tracing model with attention mechanism [J]. Journal of Computer Applications, 2023, 43(2): 343-348. |
[11] | Lin SUN, Tianjiao MA, Zhan’ao XUE. Multilabel feature selection algorithm based on Fisher score and fuzzy neighborhood entropy [J]. Journal of Computer Applications, 2023, 43(12): 3779-3789. |
[12] | Wenquan LI, Yimin MAO, Xindong PENG. Agglomerative hierarchical clustering algorithm based on hesitant fuzzy set [J]. Journal of Computer Applications, 2023, 43(12): 3755-3763. |
[13] | Lei MA, Chuan LUO, Tianrui LI, Hongmei CHEN. Fuzzy-rough set based unsupervised dynamic feature selection algorithm [J]. Journal of Computer Applications, 2023, 43(10): 3121-3128. |
[14] | Jun WU, Aijia OUYANG, Lin ZHANG. Statistically significant sequential patterns mining algorithm under influence degree [J]. Journal of Computer Applications, 2022, 42(9): 2713-2721. |
[15] | Yan LI, Bin FAN, Jie GUO. Attribute reduction algorithm based on cluster granulation and divergence among clusters [J]. Journal of Computer Applications, 2022, 42(9): 2701-2712. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||