计算机应用 ›› 2016, Vol. 36 ›› Issue (2): 377-381.DOI: 10.11772/j.issn.1001-9081.2016.02.0377

• 第三届CCF大数据学术会议(CCF BigData 2015) • 上一篇    下一篇

基于最近邻的随机非线性降维

田守财, 孙喜利, 路永钢   

  1. 兰州大学 信息科学与工程学院, 兰州 730000
  • 收稿日期:2015-08-29 修回日期:2015-09-15 出版日期:2016-02-10 发布日期:2016-02-03
  • 通讯作者: 路永钢(1974-),男,甘肃陇南人,副教授,主要研究方向:模式识别、人工智能、生物信息学。
  • 作者简介:田守财(1987-),男,河南商丘人,硕士研究生,主要研究方向:模式识别;孙喜利(1990-),女,湖南娄底人,硕士研究生,主要研究方向:模式识别。
  • 基金资助:
    国家自然科学基金资助项目(61272213)。

Stochastic nonlinear dimensionality reduction based on nearest neighbors

TIAN Shoucai, SUN Xili, LU Yonggang   

  1. School of Information Science and Engineering, Lanzhou University, Lanzhou Gansu 730000, China
  • Received:2015-08-29 Revised:2015-09-15 Online:2016-02-10 Published:2016-02-03

摘要: 针对线性降维技术应用于具有非线性结构的数据时无法得到令人满意的结果的问题,提出一种新的着重于保持高维空间局部最近邻信息的非线性随机降维算法(NNSE)。该算法首先在高维空间中通过计算样本点之间的欧氏距离找出每个样本点的最近邻点,接着在低维空间中产生一个随机的初始分布;然后通过将低维空间中的样本点不断向其最近邻点的平均位置移动,直到产生稳定的低维嵌入结果。与一种先进的非线性随机降维算法——t分布随机邻域嵌入(t-SNE)相比,NNSE算法得到的低维结果在可视化方面与t-SNE算法相差不大,但通过比较两者的量化指标可以发现,NNSE算法在保持最近邻信息方面上明显优于t-SNE算法。

关键词: 降维, 线性方法, 非线性方法, 最近邻, 随机方法

Abstract: As linear dimensionality reduction methods usually cannot produce satisfactory low-dimensional embedding when applied to data with nonlinear structure, a new nonlinear dimensionality reduction method named NNSE was proposed to keep the local nearest neighbor information in the high-dimensional space. Firstly, the nearest neighbor points were found by calculating the Euclidean distance between the sample points in the high-dimensional space, then a random initial distribution of the data points was generated in the low-dimensional space. Secondly, by moving the data points towards the mean position of their nearest neighbors found in the high-dimensional space, the data point positions were iteratively optimized until the embedding becomes stable. In the comparison with a state-of-the-art nonlinear stochastic dimensionality reduction method named t-SNE (t-distributed Stochastic Neighbor Embedding), the low-dimensional embedding produced by NNSE method is similar to the visualization produced by the t-SNE method. However, it is shown that the NNSE method is superior to t-SNE in preserving the local nearest neighbor information in the low-dimensional embedding by using a quantitative indicator.

Key words: dimensionality reduction, linear technique, nonlinear technique, nearest neighbor, stochastic method

中图分类号: