计算机应用 ›› 2017, Vol. 37 ›› Issue (12): 3467-3471.DOI: 10.11772/j.issn.1001-9081.2017.12.3467

• 人工智能 • 上一篇    下一篇

基于随机矩阵理论的高维数据特征选择方法

王妍1, 杨钧1, 孙凌峰1, 李玉诺2, 宋宝燕1   

  1. 1. 辽宁大学 信息学院, 沈阳 110036;
    2. 荣科科技股份有限公司 智慧城市开发部, 沈阳 110027
  • 收稿日期:2017-05-04 修回日期:2017-06-26 出版日期:2017-12-10 发布日期:2017-12-18
  • 通讯作者: 宋宝燕
  • 作者简介:王妍(1978-),女,辽宁抚顺人,副教授,博士,CCF会员,主要研究方向:数据库、感知数据处理、物联网;杨钧(1992-),男,安徽合肥人,硕士研究生,主要研究方向:机器学习、数据挖掘;孙凌峰(1993-),男,山东潍坊人,硕士研究生,主要研究方向:大数据处理;李玉诺(1978-),男,辽宁庄河人,硕士,主要研究方向:大数据处理、智慧城市;宋宝燕(1965-),女,辽宁铁岭人,教授,博士,CCF会员,主要研究方向:数据库理论、大数据处理。
  • 基金资助:
    国家自然科学基金资助项目(61472169,61472072,61528202,61501105);国家973计划前期研究专项(2014CB360509);辽宁省教育厅科学研究一般项目(L2015204)。

Feature selection method of high-dimensional data based on random matrix theory

WANG Yan1, YANG Jun1, SUN Lingfeng1, LI Yunuo2, SONG Baoyan1   

  1. 1. College of Information, Liaoning University, Shenyang Liaoning 110036, China;
    2. Smart City Development Department, Bring Spring Science & Technology Limited Company, Shenyang Liaoning 110027, China
  • Received:2017-05-04 Revised:2017-06-26 Online:2017-12-10 Published:2017-12-18
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61472169, 61472072, 61528202, 61501105), the Special Prophase Project on the National Basic Research Program (973) of China (2014CB360509), the Science Research Normal Fund of Liaoning Province Education Department (L2015204).

摘要: 传统特征选择方法多是通过相关度量来去除冗余特征,并没有考虑到高维相关矩阵中会存在大量的噪声,严重地影响特征选择结果。为解决此问题,提出基于随机矩阵理论(RMT)的特征选择方法。首先,将相关矩阵中符合随机矩阵预测的奇异值去除,从而得到去噪后的相关矩阵和选择特征的数量;然后,对去噪后的相关矩阵进行奇异值分解,通过分解矩阵获得特征与类的相关性;最后,根据特征与类的相关性和特征之间冗余性完成特征选择。此外,还提出一种特征选择优化方法,通过依次将每一个特征设为随机变量,比较其奇异值向量与原始奇异值向量的差异来进一步优化结果。分类实验结果表明所提方法能够有效提高分类准确率,减小训练数据规模。

关键词: 随机矩阵, 特征选择, 去噪, 奇异值, 相关矩阵

Abstract: The traditional feature selection methods always remove redundant features by using correlation measures, and it is not considered that there is a large amount of noise in a high-dimensional correlation matrix, which seriously affects the feature selection result. In order to solve the problem, a feature selection method based on Random Matrix Theory (RMT) was proposed. Firstly, the singular values of a correlation matrix which met the random matrix prediction were removed, thereby the denoised correlation matrix and the number of selected features were obtained. Then, the singular value decomposition was performed on the denoised correlation matrix, and the correlation between feature and class was obtained by decomposed matrix. Finally, the feature selection was accomplished according to the correlation between feature and class and the redundancy between features. In addition, a feature selection optimization method was proposed, which furtherly optimize the result by comparing the difference between singular value vector and original singular value vector and setting each feature as a random variable in turn. The classification experimental results show that the proposed method can effectively improve the classification accuracy and reduce the training data scale.

Key words: random matrix, feature selection, denoising, singular value, correlation matrix

中图分类号: