基于正则互表示的无监督特征选择方法

doi:10.11772/j.issn.1001-9081.2019122075

计算机应用 ›› 2020, Vol. 40 ›› Issue (7): 1896-1900.DOI: 10.11772/j.issn.1001-9081.2019122075

基于正则互表示的无监督特征选择方法

汪志远, 降爱莲, 奥斯曼·穆罕默德

太原理工大学信息与计算机学院, 山西晋中 030600

收稿日期:2019-12-09 修回日期:2020-02-24 发布日期:2020-03-26 出版日期:2020-07-10
通讯作者: 降爱莲
作者简介:汪志远(1992-),男,安徽宿州人,硕士研究生,主要研究方向:机器学习、特征选择;降爱莲(1969-),女,山西太原人,副教授,博士,CCF会员,主要研究方向:人工智能、大数据、特征选择、计算机视觉;奥斯曼·穆罕默德(1993-),男,埃塞俄比亚人,硕士研究生,主要研究方向:深度学习、图像处理。
基金资助:
山西省回国留学人员科研资助项目（2017-051）。

Unsupervised feature selection method based on regularized mutual representation

WANG Zhiyuan, JIANG Ailian, MUHAMMAD Osman

College of Information and Computer, Taiyuan University of Technology, Jinzhong Shanxi 030600, China

Received:2019-12-09 Revised:2020-02-24 Online:2020-03-26 Published:2020-07-10
Supported by:
This work is partially supported by the Research Project of Shanxi Scholarship Council of China (2017-051).

摘要/Abstract

摘要： 针对高维数据含有的冗余特征影响机器学习训练效率和泛化能力的问题，为提升模式识别准确率、降低计算复杂度，提出了一种基于正则互表示（RMR）性质的无监督特征选择方法。首先，利用特征之间的相关性，建立由Frobenius范数约束的无监督特征选择数学模型；然后，设计分治-岭回归优化算法对模型进行快速优化；最后，根据模型最优解综合评估每个特征的重要性，选出原始数据中具有代表性的特征子集。在聚类准确率指标上，RMR方法与Laplacian方法相比提升了7个百分点，与非负判别特征选择（NDFS）方法相比提升了7个百分点，与正则自表示（RSR）方法相比提升了6个百分点，与自表示特征选择（SR_FS）方法相比提升了3个百分点；在数据冗余率指标上，RMR方法与Laplacian方法相比降低了10个百分点，与NDFS方法相比降低了7个百分点，与RSR方法相比降低了3个百分点，与SR_FS方法相比降低了2个百分点。实验结果表明，RMR方法能够有效地选出重要特征，降低数据冗余率，提升样本聚类准确率。

关键词: 特征选择, 无监督学习, 分治算法, 岭回归, 正则化

Abstract: The redundant features of high-dimensional data affect the training efficiency and generalization ability of machine learning. In order to improve the accuracy of pattern recognition and reduce the computational complexity, an unsupervised feature selection method based on Regularized Mutual Representation (RMR) property was proposed. Firstly, the correlations between features were utilized to establish a mathematical model for unsupervised feature selection constrained by Frobenius norm. Then, a divide-and-conquer ridge regression optimization algorithm was designed to quickly optimize the model. Finally, the importances of the features were jointly evaluated according to the optimal solution to the model, and a representative feature subset was selected from the original data. On the clustering accuracy, RMR method is improved by 7 percentage points compared with the Laplacian method, improved by 7 percentage points compared with the Nonnegative Discriminative Feature Selection (NDFS) method, improved by 6 percentage points compared with the Regularized Self-Representation (RSR) method, and improved by 3 percentage points compared with the Self-Representation Feature Selection (SR_FS) method. On the redundancy rate, RMR method is reduced by 10 percentage points compared with the Laplacian method, reduced by 7 percentage points compared with the NDFS method, reduced by 3 percentage points compared with the RSR method, and reduced by 2 percentage points compared with the SR_FS method. The experimental results show that RMR method can effectively select important features, reduce redundancy rate of data and improve clustering accuracy of samples.

Key words: feature selection, unsupervised learning, divide-and-conquer algorithm, ridge regression, regularization

中图分类号:

TP181

汪志远, 降爱莲, 奥斯曼·穆罕默德. 基于正则互表示的无监督特征选择方法[J]. 计算机应用, 2020, 40(7): 1896-1900.

WANG Zhiyuan, JIANG Ailian, MUHAMMAD Osman. Unsupervised feature selection method based on regularized mutual representation[J]. Journal of Computer Applications, 2020, 40(7): 1896-1900.

参考文献

[1] CHANDRASHEKAR G,SAHIN F. A survey on feature selection methods[J]. Computers and Electrical Engineering,2014,40(1):16-28.
[2] WANG S,PEDRYCZ W,ZHU Q,et al. Unsupervised feature selection via maximum projection and minimum redundancy[J]. Knowledge-Based Systems,2015,75:19-29.
[3] 黄铉. 特征降维技术的研究与进展[J]. 计算机科学,2018,45(6A):16-21,53.(HUANG X. Research and development of feature dimensionality reduction[J]. Computer Science,2018,45(6A):16-21,53.)
[4] ZHU P,ZHU W,HU Q,et al. Subspace clustering guided unsupervised feature selection[J]. Pattern Recognition,2017,66:364-374.
[5] LI J,CHENG K,WANG S,et al. Feature selection:a data perspective[J]. ACM Computing Surveys,2018,50(6):No. 94.
[6] MORADI P,ROSTAMI M. A graph theoretic approach for unsupervised feature selection[J]. Engineering Applications of Artificial Intelligence,2015,44:33-45.
[7] NIE F,ZHU W,LI X. Unsupervised feature selection with structured graph optimization[C]//Proceedings of the 30th AAAI Conference on Artificial Intelligence. Palo Alto,CA:AAAI Press,2016:1302-1308.
[8] HE X,JI M,ZHANG C,et al. A variance minimization criterion to feature selection using Laplacian regularization[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2011,33(10):2013-2025.
[9] NIE F,XIANG S,JIA Y,et al. Trace ratio criterion for feature selection[C]//Proceedings of the 23rd AAAI Conference on Artificial Intelligence. Palo Alto,CA:AAAI Press,2008:671-676.
[10] 潘锋, 王建东, 牛奔. 基于谱分析的无监督特征选择算法[J]. 计算机应用,2011,31(8):2108-2110,2114.(PAN F,WANG J D,NIU B. Unsupervised feature selection algorithm based on spectral analysis[J]. Journal of Computer Applications,2011,31(8):2108-2110,2114.)
[11] ZHENG W,ZHU X,WEN G,et al. Unsupervised feature selection by self-paced learning regularization[J]. Pattern Recognition Letters,2020,132:4-11.
[12] 刘艳芳, 叶东毅. 基于邻域保持学习的无监督特征选择算法[J]. 模式识别与人工智能,2018,31(12):1096-1102.(LIU Y F,YE D Y. Unsupervised feature selection algorithm based on neighborhood preserving learning[J]. Pattern Recognition and Artificial Intelligence,2018,31(12):1096-1102.)
[13] ZHU P,ZUO W,ZHANG L,et al. Unsupervised feature selection by regularized self-representation[J]. Pattern Recognition, 2015,48(2):438-446.
[14] KHAN I M,ANDERSON K S. Performance investigation and constraint stabilization approach for the orthogonal complement-based divide-and-conquer algorithm[J]. Mechanism and Machine Theory,2013,67:111-121.
[15] SHEN X,ALAM M,FIKSE F,et al. A novel generalized ridge regression method for quantitative genetics[J]. Genetics,2013,193(4):1255-1268.
[16] COHEN M B,ELDER S,MUSCO C,et al. Dimensionality reduction for k-means clustering and low rank approximation[C]//Proceedings of the 201547th Annual ACM Symposium on Theory of Computing. New York:ACM,2015:163-172.
[17] HE X,CAI D,NIYOGI P. Laplacian score for feature selection[C]//Proceedings of the 18th International Conference on Neural Information Processing Systems. Cambridge:MIT Press,2005:507-514.
[18] LI Z,YANG Y,LIU J,et al. Unsupervised feature selection using nonnegative spectral analysis[C]//Proceedings of the 26th AAAI Conference on Artificial Intelligence. Palo Alto,CA:AAAI Press,2012:1026-1032.
[19] HE W,ZHU X,CHENG D,et al. Unsupervised feature selection for visual classification via feature-representation property[J]. Neurocomputing,2017,236:5-13.

基于正则互表示的无监督特征选择方法

Unsupervised feature selection method based on regularized mutual representation

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	贾洁茹, 杨建超, 张硕蕊, 闫涛, 陈斌. 基于自蒸馏视觉Transformer的无监督行人重识别[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2893-2902.
[2]	陈虹, 齐兵, 金海波, 武聪, 张立昂. 融合1D-CNN与BiGRU的类不平衡流量异常检测[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2493-2499.
[3]	雷明珠, 王浩, 贾蓉, 白琳, 潘晓英. 基于特征间关系合成少数类样本的过采样算法[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1428-1436.
[4]	高麟, 周宇, 邝得互. 进化双层自适应局部特征选择[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1408-1414.
[5]	夏吾吉, 黄鹤鸣, 更藏措毛, 范玉涛. 基于无监督学习和监督学习的抽取式文本摘要综述[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1035-1048.
[6]	徐大鹏, 侯新民. 基于网络结构设计的图神经网络特征选择方法[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 663-670.
[7]	孟圣洁, 于万钧, 陈颖. 最大相关和最大差异的高维数据特征选择算法[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 767-771.
[8]	孙林, 刘梦含. 基于自适应布谷鸟优化特征选择的K-means聚类[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 831-841.
[9]	江锐, 刘威, 陈成, 卢涛. 非对称端到端的无监督图像去雨网络[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 922-930.
[10]	刘晶鑫, 黄雯静, 徐亮胜, 黄冲, 吴建生. 字典学习与样本关联保持结合的无监督特征选择模型[J]. 《计算机应用》唯一官方网站, 2024, 44(12): 3766-3775.
[11]	胡能兵, 蔡彪, 李旭, 曹旦华. 基于图池化对比学习的图分类方法[J]. 《计算机应用》唯一官方网站, 2024, 44(11): 3327-3334.
[12]	张帅华, 张淑芬, 周明川, 徐超, 陈学斌. 基于半监督联邦学习的恶意流量检测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(11): 3487-3494.
[13]	巫婕, 钱雪忠, 宋威. 基于相似度聚类和正则化的个性化联邦学习[J]. 《计算机应用》唯一官方网站, 2024, 44(11): 3345-3353.
[14]	赵培, 乔焰, 胡荣耀, 袁新宇, 李敏悦, 张本初. 基于多域特征提取的多变量时间序列异常检测[J]. 《计算机应用》唯一官方网站, 2024, 44(11): 3419-3426.
[15]	何添, 沈宗鑫, 黄倩倩, 黄雁勇. 基于自适应学习的多视图无监督特征选择方法[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2657-2664.