《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (9): 2657-2664.DOI: 10.11772/j.issn.1001-9081.2022091404
• 2022第10届CCF大数据学术会议 • 上一篇 下一篇
收稿日期:
2022-09-20
修回日期:
2022-10-27
接受日期:
2022-11-03
发布日期:
2023-09-10
出版日期:
2023-09-10
通讯作者:
黄雁勇
作者简介:
何添(1995—),男,重庆人,硕士,主要研究方向:数据挖掘基金资助:
Tian HE1, Zongxin SHEN1, Qianqian HUANG2, Yanyong HUANG1()
Received:
2022-09-20
Revised:
2022-10-27
Accepted:
2022-11-03
Online:
2023-09-10
Published:
2023-09-10
Contact:
Yanyong HUANG
About author:
HE Tian, born in 1995, M. S. His research interests include data mining.Supported by:
摘要:
现有的多视图无监督特征选择方法大多存在以下问题:样本的相似度矩阵、不同视图的权重矩阵和特征的权重矩阵往往是预先定义的,不能有效刻画数据间的真实结构以及反映不同视图和特征的重要性,进而导致不能选出有用的特征。为解决上述问题,首先,在多视图模糊C均值聚类的基础上进行视图权重和特征权重的自适应学习,以同时实现特征选择并保证聚类性能;然后,在拉普拉斯秩约束下自适应地学习样本的相似度矩阵,并构建一个基于自适应学习的多视图无监督特征选择(ALMUFS)方法;最后,设计一种交替迭代优化算法对目标函数进行求解,并在8个真实数据集上将所提方法与6种无监督特征选择基线方法进行比较。实验结果表明,ALMUFS的聚类精度和F-measure优于其他方法,与自适应协作相似性学习(ACSL)相比,平均提高8.99和11.87个百分点;与ASVM(Adaptive Similarity and View Weight)相比,平均提高11.09和13.21个百分点,验证了所提方法的可行性和有效性。
中图分类号:
何添, 沈宗鑫, 黄倩倩, 黄雁勇. 基于自适应学习的多视图无监督特征选择方法[J]. 计算机应用, 2023, 43(9): 2657-2664.
Tian HE, Zongxin SHEN, Qianqian HUANG, Yanyong HUANG. Adaptive learning-based multi-view unsupervised feature selection method[J]. Journal of Computer Applications, 2023, 43(9): 2657-2664.
符号 | 符号含义 |
---|---|
数据集中第v个视图的数据 | |
数据集中第v个视图的第i个样本 | |
Y | 模糊隶属度矩阵 |
第i个样本属于第k个簇的模糊隶属度 | |
n | 样本量 |
m | 视图个数 |
c | 类别数 |
S | 样本相似度矩阵, |
样本相似度矩阵 S 中第i行,第j列的元素 | |
第v个视图中特征数 | |
第v个视图的视图权重 | |
第v个视图中的特征权重向量 | |
α | 超参数,用于控制相似度矩阵 S 的稀疏性 |
β | 超参数,用于控制特征权重向量稀疏性 |
超参数,用于融合两个聚类过程 | |
γ | 超参数,用于控制视图稀疏性 |
超参数,用于控制相似度矩阵 S 中连通分量个数 |
表1 符号及含义
Tab.1 Symbols and their meanings
符号 | 符号含义 |
---|---|
数据集中第v个视图的数据 | |
数据集中第v个视图的第i个样本 | |
Y | 模糊隶属度矩阵 |
第i个样本属于第k个簇的模糊隶属度 | |
n | 样本量 |
m | 视图个数 |
c | 类别数 |
S | 样本相似度矩阵, |
样本相似度矩阵 S 中第i行,第j列的元素 | |
第v个视图中特征数 | |
第v个视图的视图权重 | |
第v个视图中的特征权重向量 | |
α | 超参数,用于控制相似度矩阵 S 的稀疏性 |
β | 超参数,用于控制特征权重向量稀疏性 |
超参数,用于融合两个聚类过程 | |
γ | 超参数,用于控制视图稀疏性 |
超参数,用于控制相似度矩阵 S 中连通分量个数 |
数据集 | 视图数 | 样本数 | 类别数 | 特征维度 |
---|---|---|---|---|
Yale[ | 4 | 165 | 15 | 256/256/256/256 |
MSRCV1[ | 6 | 210 | 7 | 1 203/48/512/100/256/ 210 |
Politics[ | 7 | 348 | 7 | 1 047/1 051/348/348/ 348/348/348 |
Politicsuk[ | 3 | 419 | 5 | 2 879/419/419 |
WikipediaArticles[ | 2 | 693 | 10 | 128/10 |
Rugby[ | 2 | 854 | 15 | 854/854 |
WebKB[ | 2 | 1 051 | 2 | 2 949/334 |
Caltech101-7[ | 6 | 1 474 | 7 | 48/40/254/1 984/512/928 |
表2 数据集的统计信息
Tab. 2 Statistics of datasets
数据集 | 视图数 | 样本数 | 类别数 | 特征维度 |
---|---|---|---|---|
Yale[ | 4 | 165 | 15 | 256/256/256/256 |
MSRCV1[ | 6 | 210 | 7 | 1 203/48/512/100/256/ 210 |
Politics[ | 7 | 348 | 7 | 1 047/1 051/348/348/ 348/348/348 |
Politicsuk[ | 3 | 419 | 5 | 2 879/419/419 |
WikipediaArticles[ | 2 | 693 | 10 | 128/10 |
Rugby[ | 2 | 854 | 15 | 854/854 |
WebKB[ | 2 | 1 051 | 2 | 2 949/334 |
Caltech101-7[ | 6 | 1 474 | 7 | 48/40/254/1 984/512/928 |
数据集 | All-feature | ALMUFS | ACSL | ASVM | CGMV-UFS | OMVFS | LS |
---|---|---|---|---|---|---|---|
Yale | 44.65±4.01 | 36.24±2.68* | 41.49±4.51* | 36.93±2.64* | 36.38±3.35* | 39.82±4.64* | |
MSRCV1 | 68.70±8.21* | 72.83±8.41 | 63.17±9.02* | 55.56±6.87* | 61.98±7.28* | 68.27±8.84 | |
Politics | 49.83±6.89* | 63.65±9.02 | 50.76±4.82* | 47.84±4.80* | 44.01±3.94* | 42.27±4.21* | |
Politicsuk | 72.11±7.15 | 63.52±6.26* | 52.00±5.03* | 48.23±2.03* | 56.78±7.73* | 59.90±4.77* | |
WikipediaArticles | 23.71±1.36* | 45.00±3.55 | 25.83±2.28* | 20.34±1.05* | 27.19± 1.95* | 25.91±3.34* | |
Rugby | 44.31±6.55* | 52.08±5.16 | 36.89±5.79* | 39.48±4.94* | 27.70±3.81* | 40.27±6.01* | |
WebKB | 78.01±0.18 | 63.46±0.83* | 66.37±2.79* | 66.64±0.74* | 66.56±1.60* | 66.06±1.70* | |
Caltech101-7 | 51.93±6.67 | 54.21±7.60 | 52.42± 6.49 | 50.60±7.36* | 50.66±4.88 | 52.27±5.87 |
表3 特征选择率为0.4时的ACC结果 (%)
Tab. 3 ACC results with feature selection ratio of 0.4
数据集 | All-feature | ALMUFS | ACSL | ASVM | CGMV-UFS | OMVFS | LS |
---|---|---|---|---|---|---|---|
Yale | 44.65±4.01 | 36.24±2.68* | 41.49±4.51* | 36.93±2.64* | 36.38±3.35* | 39.82±4.64* | |
MSRCV1 | 68.70±8.21* | 72.83±8.41 | 63.17±9.02* | 55.56±6.87* | 61.98±7.28* | 68.27±8.84 | |
Politics | 49.83±6.89* | 63.65±9.02 | 50.76±4.82* | 47.84±4.80* | 44.01±3.94* | 42.27±4.21* | |
Politicsuk | 72.11±7.15 | 63.52±6.26* | 52.00±5.03* | 48.23±2.03* | 56.78±7.73* | 59.90±4.77* | |
WikipediaArticles | 23.71±1.36* | 45.00±3.55 | 25.83±2.28* | 20.34±1.05* | 27.19± 1.95* | 25.91±3.34* | |
Rugby | 44.31±6.55* | 52.08±5.16 | 36.89±5.79* | 39.48±4.94* | 27.70±3.81* | 40.27±6.01* | |
WebKB | 78.01±0.18 | 63.46±0.83* | 66.37±2.79* | 66.64±0.74* | 66.56±1.60* | 66.06±1.70* | |
Caltech101-7 | 51.93±6.67 | 54.21±7.60 | 52.42± 6.49 | 50.60±7.36* | 50.66±4.88 | 52.27±5.87 |
数据集 | All-feature | ALMUFS | ACSL | ASVM | CGMV-UFS | OMVFS | LS |
---|---|---|---|---|---|---|---|
Yale | 50.06±4.05 | 38.63±2.61* | 44.90±3.72* | 39.56±2.15* | 39.48±2.81* | 42.34±3.73* | |
MSRCV1 | 72.58±6.94* | 76.82±6.07 | 68.57± 7.14* | 59.44±5.62* | 66.22±5.57* | 72.88±6.71* | |
Politics | 53.00±5.52* | 72.02±6.29 | 53.00±4.00* | 50.52± 4.47* | 47.08±2.94* | 46.40±4.00* | |
Politicsuk | 78.71±11.60 | 63.12±3.53* | 59.36±4.21* | 53.01±3.40* | 63.19±7.00* | 63.39±2.68* | |
WikipediaArticles | 25.83±1.45* | 46.72±2.94 | 26.66±1.83* | 21.72±0.92* | 29.60±1.99* | 27.17±2.67* | |
Rugby | 47.70±5.35* | 56.14±4.28 | 40.87±4.40* | 42.35± 4.27* | 33.52±2.59* | 44.82±5.11* | |
WebKB | 87.45±0.38 | 65.57±0.55* | 68.25±3.90* | 67.71±0.50* | 67.63±1.17* | 66.70±1.09* | |
Caltech101-7 | 55.49±5.27 | 57.35±5.19 | 54.36±5.59* | 53.41±6.28* | 54.58±3.23* | 56.29±4.39 |
表4 特征选择率为0.4时的F-measure (%)
Tab. 4 F-measure results with feature selection ratio of 0.4
数据集 | All-feature | ALMUFS | ACSL | ASVM | CGMV-UFS | OMVFS | LS |
---|---|---|---|---|---|---|---|
Yale | 50.06±4.05 | 38.63±2.61* | 44.90±3.72* | 39.56±2.15* | 39.48±2.81* | 42.34±3.73* | |
MSRCV1 | 72.58±6.94* | 76.82±6.07 | 68.57± 7.14* | 59.44±5.62* | 66.22±5.57* | 72.88±6.71* | |
Politics | 53.00±5.52* | 72.02±6.29 | 53.00±4.00* | 50.52± 4.47* | 47.08±2.94* | 46.40±4.00* | |
Politicsuk | 78.71±11.60 | 63.12±3.53* | 59.36±4.21* | 53.01±3.40* | 63.19±7.00* | 63.39±2.68* | |
WikipediaArticles | 25.83±1.45* | 46.72±2.94 | 26.66±1.83* | 21.72±0.92* | 29.60±1.99* | 27.17±2.67* | |
Rugby | 47.70±5.35* | 56.14±4.28 | 40.87±4.40* | 42.35± 4.27* | 33.52±2.59* | 44.82±5.11* | |
WebKB | 87.45±0.38 | 65.57±0.55* | 68.25±3.90* | 67.71±0.50* | 67.63±1.17* | 66.70±1.09* | |
Caltech101-7 | 55.49±5.27 | 57.35±5.19 | 54.36±5.59* | 53.41±6.28* | 54.58±3.23* | 56.29±4.39 |
1 | LIU H, MOTODA H. Feature Selection for Knowledge Discovery and Data Mining, SECS 454[M]. New York: Springer, 1998: 1-10. 10.1007/978-1-4615-5689-3 |
2 | LI J D, CHENG K W, WANG S H, et al. Feature selection: a data perspective[J]. ACM Computing Surveys, 2017, 50(6): No.94. 10.1145/3136625 |
3 | STAŃCZYK U, JAIN L C. Feature selection for data and pattern recognition: an introduction[M]// Feature Selection for Data and Pattern Recognition, SCI 584. Berlin: Springer, 2015: 1-7. 10.1007/978-3-662-45620-0_1 |
4 | MOHAMAD M A, HASSAN H, NASIEN D, et al. A review on feature extraction and feature selection for handwritten character recognition[J]. International Journal of Advanced Computer Science and Applications, 2015, 6(2): 204-212. 10.14569/ijacsa.2015.060230 |
5 | FEGN L, CAI L, LIU Y, et al. Multi-view spectral clustering via robust local subspace learning[J]. Soft Computing, 2017, 21(8): 1937-1948. 10.1007/s00500-016-2120-3 |
6 | CAI J, LUO J W, WANG S L, et al. Feature selection in machine learning: a new perspective[J]. Neurocomputing, 2018, 300: 70-79. 10.1016/j.neucom.2017.11.077 |
7 | NGUYEN B H, XUE B, ZHANG M J. A survey on swarm intelligence approaches to feature selection in data mining[J]. Swarm and Evolutionary Computation, 2020, 54: No.100663. 10.1016/j.swevo.2020.100663 |
8 | ZEBARI R R, ABDULAZEEZ A M, ZEEBAREE D Q, et al. A comprehensive review of dimensionality reduction techniques for feature selection and feature extraction[J]. Journal of Applied Science and Technology Trends, 2020, 1(2): 56-70. 10.38094/jastt1224 |
9 | LUALDI M, FASANO M. Statistical analysis of proteomics data: a review on feature selection[J]. Journal of Proteomics, 2019, 198: 18-26. 10.1016/j.jprot.2018.12.004 |
10 | CHANDRA B, GUPTA M. An efficient statistical feature selection approach for classification of gene expression data[J]. Journal of Biomedical Informatics, 2011, 44(4): 529-535. 10.1016/j.jbi.2011.01.001 |
11 | HE X F, CAI D, NIYOGI P. Laplacian score for feature selection[C]// Proceedings of the 18th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2005: 507-514. |
12 | ZHAO Z A, LIU H. Spectral Feature Selection for Data Mining[M]. New York: Chapman & Hall, 2011: 1-110. |
13 | ZHAO Z, WANG L, LIU H. Efficient spectral feature selection with minimum redundancy[C]// Proceedings of the 24th AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2010: 673-678. 10.1609/aaai.v24i1.7671 |
14 | WANG Z, FENG Y F, QI T, et al. Adaptive multi-view feature selection for human motion retrieval[J]. Signal Processing, 2016, 120: 691-701. 10.1016/j.sigpro.2014.11.015 |
15 | TANG J L, HU X, GAO H J, et al. Unsupervised feature selection for multi-view data in social media[C]// Proceedings of the 2013 SIAM International Conference on Data Mining. Philadelphia, PA: SIAM, 2013: 270-278. 10.1137/1.9781611972832.30 |
16 | FENG Y F, XIAO J, ZHUANG Y T, et al. Adaptive unsupervised multi-view feature selection for visual concept recognition[C]// Proceedings of the 2012 Asian Conference on Computer Vision, LNCS 7724. Berlin: Springer, 2013: 343-357. |
17 | DONG X, ZHU L, SONG X M, et al. Adaptive collaborative similarity learning for unsupervised multi-view feature selection[C]// Proceedings of the 27th International Joint Conference on Artificial Intelligence. California: ijcai.org, 2018: 2064-2070. 10.24963/ijcai.2018/285 |
18 | HOU C P, NIE F P, TAO H, et al. Multi-view unsupervised feature selection with adaptive similarity and view weight[J]. IEEE Transactions on Knowledge and Data Engineering, 2017, 29(9): 1998-2011. 10.1109/tkde.2017.2681670 |
19 | SHAO W X, HE L F, LU C T, et al. Online unsupervised multi-view feature selection[C]// Proceedings of the IEEE 16th International Conference on Data Mining. Piscataway: IEEE, 2016: 1203-1208. 10.1109/icdm.2016.0160 |
20 | TANG C, CHEN J J, LIU X W, et al. Consensus learning guided multi-view unsupervised feature selection[J]. Knowledge-Based Systems, 2018, 160: 49-60. 10.1016/j.knosys.2018.06.016 |
21 | BEZDEK J C, EHRLICH R, FULL W. FCM: the fuzzy c-means clustering algorithm[J]. Computers and Geosciences, 1984, 10(2/3): 191-203. 10.1016/0098-3004(84)90020-7 |
22 | TANG C, ZHENG X, LIU X W, et al. Cross-view locality preserved diversity and consensus learning for multi-view unsupervised feature selection[J]. IEEE Transactions on Knowledge and Data Engineering, 2022, 34(10): 4705-4716. 10.1109/tkde.2020.3048678 |
23 | MIAO J Y, YANG T J, SUN L J, et al. Graph regularized locally linear embedding for unsupervised feature selection[J]. Pattern Recognition, 2022, 122: No.108299. 10.1016/j.patcog.2021.108299 |
24 | ZHU X F, LI X L, ZHANG S C, et al. Robust joint graph sparse coding for unsupervised spectral feature selection[J]. IEEE Transactions on Neural Networks and Learning Systems, 2017, 28(6): 1263-1275. 10.1109/tnnls.2016.2521602 |
25 | SOLORIO-FERNáNDEZ S, CARRASCO-OCHOA J A, MARTÍNEZ-TRINIDAD J F. A review of unsupervised feature selection methods[J]. Artificial Intelligence Review, 2020, 53(2): 907-948. 10.1007/s10462-019-09682-y |
26 | LUO M N, NIE F P, CHANG X J, et al. Adaptive unsupervised feature selection with structure regularization[J]. IEEE Transactions on Neural Networks and Learning Systems, 2018, 29(4): 944-956. 10.1109/tnnls.2017.2650978 |
27 | BELHUMEUR P N, HESPANHA J P, KRIEGMAN D J. Eigenfaces vs. fisherfaces: recognition using class specific linear projection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1997, 19(7): 711-720. 10.1109/34.598228 |
28 | LEE Y J, GRAUMAN K. Foreground focus: unsupervised learning from partially matching images[J]. International Journal of Computer Vision, 2009, 85(2): 143-166. 10.1007/s11263-009-0252-y |
29 | GREENE D, CUNNINGHAM P. Producing a unified graph representation from multiple social network views[C]// Proceedings of the 5th Annual ACM Web Science Conference. New York: ACM, 2013: 118-121. 10.1145/2464464.2464471 |
30 | YANG Y, WANG H. Multi-view clustering: a survey[J]. Big Data Mining and Analytics, 2018, 1(2): 83-107. 10.26599/bdma.2018.9020003 |
31 | ZONG L, ZHANG X, LIU X, et al. Multi-view clustering on data with partial instances and clusters[J]. Neural Networks, 2020, 129: 19-30. 10.1016/j.neunet.2020.05.021 |
32 | SINDHWANI V, NIYOGI P, BELKIN M. Beyond the point cloud: from transductive to semi-supervised learning[C]// Proceedings of the 22nd International Conference on Machine Learning. New York: ACM, 2005: 824-831. 10.1145/1102351.1102455 |
33 | DUECK D, FREY B J. Non-metric affinity propagation for unsupervised image categorization[C]// Proceedings of the IEEE 11th International Conference on Computer Vision. Piscataway: IEEE, 2007: 1-8. 10.1109/iccv.2007.4408853 |
[1] | 刘晶鑫, 黄雯静, 徐亮胜, 黄冲, 吴建生. 字典学习与样本关联保持结合的无监督特征选择模型[J]. 《计算机应用》唯一官方网站, 2024, 44(12): 3766-3775. |
[2] | 马志峰, 于俊洋, 王龙葛. 多样性表示的深度子空间聚类算法[J]. 《计算机应用》唯一官方网站, 2023, 43(2): 407-412. |
[3] | 王培崇, 冯浩婧, 李丽荣. 基于自适应竞争学习的教与学优化算法[J]. 《计算机应用》唯一官方网站, 2023, 43(12): 3868-3874. |
[4] | 杨煜, 段威威. 基于谱聚类的社交网络动态社区发现算法[J]. 《计算机应用》唯一官方网站, 2023, 43(10): 3129-3135. |
[5] | 张成, 万源, 强浩鹏. 基于知识蒸馏的深度无监督离散跨模态哈希[J]. 计算机应用, 2021, 41(9): 2523-2531. |
[6] | 蔡瑞光, 张德生, 肖燕婷. 参数独立的加权局部均值伪近邻分类算法[J]. 计算机应用, 2021, 41(6): 1694-1700. |
[7] | 高冉, 陈花竹. 改进的基于谱聚类的子空间聚类模型[J]. 《计算机应用》唯一官方网站, 2021, 41(12): 3645-3651. |
[8] | 吕亚丽, 苗钧重, 胡玮昕. 基于标签进行度量学习的图半监督学习算法[J]. 计算机应用, 2020, 40(12): 3430-3436. |
[9] | 刘然, 刘宇, 顾进广. 基于自适应学习率优化的AdaNet改进[J]. 计算机应用, 2020, 40(10): 2804-2810. |
[10] | 杨燕琳, 冶忠林, 赵海兴, 孟磊. 基于高阶近似的链路预测算法[J]. 计算机应用, 2019, 39(8): 2366-2373. |
[11] | 王娜, 王小凤, 耿国华, 宋倩楠. 基于C均值聚类和图转导的半监督分类算法[J]. 计算机应用, 2017, 37(9): 2595-2599. |
[12] | 褚征, 于炯, 王佳玉, 王跃飞. 基于LDA主题模型的移动应用相似度构建方法[J]. 计算机应用, 2017, 37(4): 1075-1082. |
[13] | 程铃钫, 杨天鹏, 陈黎飞. 不平衡数据的软子空间聚类算法[J]. 计算机应用, 2017, 37(10): 2952-2957. |
[14] | 王培崇. 改进的动态自适应学习教与学优化算法[J]. 计算机应用, 2016, 36(3): 708-712. |
[15] | 束珏, 成卫青, 邓聪. 基于话题标签和转发关系的微博聚类和主题词提取[J]. 计算机应用, 2016, 36(2): 460-464. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||