计算机应用 ›› 2021, Vol. 41 ›› Issue (5): 1282-1289.DOI: 10.11772/j.issn.1001-9081.2020071099

所属专题: 人工智能

• 人工智能 • 上一篇    下一篇

基于图结构优化的自适应多度量非监督特征选择方法

林筠超, 万源   

  1. 武汉理工大学 理学院, 武汉 430070
  • 收稿日期:2020-07-28 修回日期:2020-09-16 出版日期:2021-05-10 发布日期:2020-10-19
  • 通讯作者: 万源
  • 作者简介:林筠超(1997-),男,湖北武汉人,硕士研究生,主要研究方向:机器学习、模式识别;万源(1976-),女,湖北武汉人,教授,博士,CCF会员,主要研究方向:机器学习、图像处理、模式识别。
  • 基金资助:
    中央高校基本科研业务费专项资金资助项目(2019IB010)。

Self-adaptive multi-measure unsupervised feature selection method with structured graph optimization

LIN Junchao, WAN Yuan   

  1. School of Science, Wuhan University of Technology, Wuhan Hubei 430070, China
  • Received:2020-07-28 Revised:2020-09-16 Online:2021-05-10 Published:2020-10-19
  • Supported by:
    This work is partially supported by the Fundamental Research Funds for the Central Universities (2019IB010).

摘要: 非监督特征选择是机器学习领域的热点研究问题,对于高维数据的降维和分类都极为重要。数据点之间的相似性可以用多个不同的标准来衡量,这使得不同的数据点之间相似性度量标准难以一致;并且现有方法多数通过近邻分配得到相似矩阵,因此其连通分量数通常不够理想。针对这两个问题,将相似矩阵看作变量而非预先对其进行设定,提出了一种基于图结构优化的自适应多度量非监督特征选择(SAM-SGO)方法。该方法将不同的度量函数自适应地融合成一种统一的度量,从而对多种度量方法进行综合,自适应地获得数据的相似矩阵,并且更准确地捕获数据点之间的关系。为获得理想的图结构,通过对相似矩阵的秩进行约束,在优化图局部结构的同时简化了计算。此外,将基于图的降维问题合并到所提出的自适应多度量问题中,并引入稀疏l2,0正则化约束以获得用于特征选择的稀疏投影。在多个标准数据集上的实验验证了SAM-SGO的有效性,相比较于近年所提出的基于局部学习聚类的特征选择和内核学习(LLCFS)、依赖指导的非监督特征选择(DGUFS)和结构化最优图特征选择(SOGFS)方法,该方法的聚类正确率平均提高了约3.6个百分点。

关键词: 自适应多度量, 图结构优化, 子空间学习, 稀疏正则化约束, 非监督特征选择

Abstract: Unsupervised feature selection attracts much attention in the field of machine learning, and is very important for dimensionality reduction and classification of high-dimensional data. The similarity between data points can be measured by several different criteria, which results in the inconsistency of the similarity measure criteria between different data points. At the same time, in existing methods, the similarity matrices are most obtained by allocation of neighbors, so that the number of the connected components is usually not ideal. To address the two problems, a Self-Adaptive Multi-measure unsupervised feature selection with Structured Graph Optimization (SAM-SGO) method was proposed with regarding the similarity matrix as a variable instead of a preset thing. By fusing different measure functions into a unified measure adaptively, various measure methods could be synthesized, the similarity matrix of data was obtained adaptively, and the relationships between data points were captured more accurately. In order to obtain an ideal graph structure, a constraint was imposed on the rank of similarity matrix to optimize the local structure of the graph and simplify the calculation. In addition, the graph based dimensionality reduction problem was incorporated into the proposed adaptive multi-measure problem, and the sparsity-inducing l2,0 regularization constraint was introduced to obtain the sparse projection used for feature selection. Experiments on several standard datasets demonstrate the effectiveness of SAM-SGO. Compared with Local Learning-based Clustering Feature Selection (LLCFS), Dependence Guided Unsupervised Feature Selection (DGUFS) and Structured Optimal Graph Feature Selection (SOGFS) methods proposed in recent years, the clustering accuracy of this method is improved by about 3.6 percentage points averagely.

Key words: Self-Adaptive Multi-measure (SAM), Structured Graph Optimization (SGO), subspace learning, sparse regularization constraint, unsupervised feature selection

中图分类号: