Journal of Computer Applications ›› 2018, Vol. 38 ›› Issue (12): 3444-3449.DOI: 10.11772/j.issn.1001-9081.2018050954

Previous Articles     Next Articles

Low rank non-linear feature selection algorithm

ZHANG Leyuan, LI Jiaye, LI Pengqing   

  1. College of Computer Science and Information Engineering, Guangxi Normal University, Guilin Guangxi 541004, China
  • Received:2018-05-08 Revised:2018-06-29 Online:2018-12-15 Published:2018-12-10
  • Contact: 张乐园
  • Supported by:
    This work is partially supported by the National Key Research and Development Program of China (2016YFB1000905), the National Natural Science Foundation of China (61170131, 61263035, 61573270, 90718020), the National Basic Research Program (973 Program) of China (2013CB329404), the China Postdoctoral Science Foundation (2015M570837), the Natural Science Foundation of Guangxi (2015GXNSFCB139011, 2015GXNSFAA139306).


张乐园, 李佳烨, 李鹏清   

  1. 广西师范大学 计算机科学与信息工程学院, 广西 桂林 541004
  • 通讯作者: 张乐园
  • 作者简介:张乐园(1995-),男,安徽蒙城人,硕士研究生,主要研究方向:机器学习、数据挖掘;李佳烨(1993-),男,山西晋城人,硕士研究生,主要研究方向:机器学习、数据挖掘;李鹏清(1993-),男,广东深圳人,硕士研究生,主要研究方向:机器学习、数据挖掘。
  • 基金资助:

Abstract: Concerning the problems of high-dimensional data, such as non-linearity, low-rank form, and feature redundancy, an unsupervised feature selection algorithm based on kernel function was proposd, named Low Rank Non-linear Feature Selection algroithm (LRNFS). Firstly, the features of each dimension were mapped to a high-dimensional kernel space, and the non-linear feature selection in the low-dimensional space was achieved through the linear feature selection in the kernel space. Then, the deviation terms were introduced into the self-expression form, and the low rank and sparse processing of coefficient matrix were achieved. Finally, the sparse regularization factor of kernel matrix coefficient vector was introduced to implement the feature selection. In the proposed algorithm, the kernel matrix was used to represent its non-linear relationship, the global information of data was taken into account in low rank to perform subspace learning, and the importance of feature was determined by the self-expression form. The experimental results show that, compared with the semi-supervised feature selection algorithm via Rescaled Linear Square Regression (RLSR), the classification accuracy of the proposed algorithm after feature selection is increased by 2.34%. The proposed algorithm can solve the problem that the data is linearly inseparable in the low-dimensional feature space, and improve the accuracy of feature selection.

Key words: feature selection, kernel function, subspace learning, low rank representation, sparse processing

摘要: 针对高维的数据中往往存在非线性、低秩形式和属性冗余等问题,提出一种基于核函数的属性自表达无监督属性选择算法——低秩约束的非线性属性选择算法(LRNFS)。首先,将每一维的属性映射到高维的核空间上,通过核空间上的线性属性选择去实现低维空间上的非线性属性选择;然后,对自表达形式引入偏差项并对系数矩阵进行低秩与稀疏处理;最后,引入核矩阵的系数向量的稀疏正则化因子来实现属性选择。所提算法中用核矩阵来体现其非线性关系,低秩考虑数据的全局信息进行子空间学习,自表达形式确定属性的重要程度。实验结果表明,相比于基于重新调整的线性平方回归(RLSR)半监督特征选择算法,所提算法进行属性选择之后作分类的准确率提升了2.34%。所提算法解决了数据在低维特征空间上线性不可分的问题,提升了属性选择的准确率。

关键词: 属性选择, 核函数, 子空间学习, 低秩表示, 稀疏处理

CLC Number: