Stochastic nonlinear dimensionality reduction based on nearest neighbors

doi:10.11772/j.issn.1001-9081.2016.02.0377

Journal of Computer Applications ›› 2016, Vol. 36 ›› Issue (2): 377-381.DOI: 10.11772/j.issn.1001-9081.2016.02.0377

Previous Articles Next Articles

Stochastic nonlinear dimensionality reduction based on nearest neighbors

TIAN Shoucai, SUN Xili, LU Yonggang

School of Information Science and Engineering, Lanzhou University, Lanzhou Gansu 730000, China

Received:2015-08-29 Revised:2015-09-15 Online:2016-02-10 Published:2016-02-03

基于最近邻的随机非线性降维

田守财, 孙喜利, 路永钢

兰州大学信息科学与工程学院, 兰州 730000

通讯作者: 路永钢(1974-),男,甘肃陇南人,副教授,主要研究方向:模式识别、人工智能、生物信息学。
作者简介:田守财(1987-),男,河南商丘人,硕士研究生,主要研究方向:模式识别;孙喜利(1990-),女,湖南娄底人,硕士研究生,主要研究方向:模式识别。
基金资助:
国家自然科学基金资助项目(61272213)。

Abstract

Abstract: As linear dimensionality reduction methods usually cannot produce satisfactory low-dimensional embedding when applied to data with nonlinear structure, a new nonlinear dimensionality reduction method named NNSE was proposed to keep the local nearest neighbor information in the high-dimensional space. Firstly, the nearest neighbor points were found by calculating the Euclidean distance between the sample points in the high-dimensional space, then a random initial distribution of the data points was generated in the low-dimensional space. Secondly, by moving the data points towards the mean position of their nearest neighbors found in the high-dimensional space, the data point positions were iteratively optimized until the embedding becomes stable. In the comparison with a state-of-the-art nonlinear stochastic dimensionality reduction method named t-SNE (t-distributed Stochastic Neighbor Embedding), the low-dimensional embedding produced by NNSE method is similar to the visualization produced by the t-SNE method. However, it is shown that the NNSE method is superior to t-SNE in preserving the local nearest neighbor information in the low-dimensional embedding by using a quantitative indicator.

Key words: dimensionality reduction, linear technique, nonlinear technique, nearest neighbor, stochastic method

摘要： 针对线性降维技术应用于具有非线性结构的数据时无法得到令人满意的结果的问题,提出一种新的着重于保持高维空间局部最近邻信息的非线性随机降维算法(NNSE)。该算法首先在高维空间中通过计算样本点之间的欧氏距离找出每个样本点的最近邻点,接着在低维空间中产生一个随机的初始分布;然后通过将低维空间中的样本点不断向其最近邻点的平均位置移动,直到产生稳定的低维嵌入结果。与一种先进的非线性随机降维算法——t分布随机邻域嵌入(t-SNE)相比,NNSE算法得到的低维结果在可视化方面与t-SNE算法相差不大,但通过比较两者的量化指标可以发现,NNSE算法在保持最近邻信息方面上明显优于t-SNE算法。

关键词: 降维, 线性方法, 非线性方法, 最近邻, 随机方法

CLC Number:

TP181

TIAN Shoucai, SUN Xili, LU Yonggang. Stochastic nonlinear dimensionality reduction based on nearest neighbors[J]. Journal of Computer Applications, 2016, 36(2): 377-381.

田守财, 孙喜利, 路永钢. 基于最近邻的随机非线性降维[J]. 计算机应用, 2016, 36(2): 377-381.

References

[1] DOCTOROW C. Welcome to the petacentre[J]. Nature, 2008, 455 (7209): 16-21.
[2] REICHMAN O J, JONES M B, SCHILDHAUER M P. Challenges and opportunities of open data in ecology[J]. Science, 2011, 331(6018): 703-705.
[3] 任磊,杜一,马帅,等.大数据可视分析综述[J].软件学报,2014,25(9):1909-1936. (REN L, DU Y, MA S, et al. Visual analytics towards big data[J]. Journal of Software, 2014, 25(9): 1909-1936.)
[4] 郝晓军,闫京海,樊友谊.大数据分析过程中的降维方法[J].航天电子对抗,2014,30(4):58-60. (HAO X J, YAN J H, FAN Y Y. Dimensionality reduction of large volumes of data analysis[J]. Aerospace Electronic Warfare, 2014, 30(4):58-60.)
[5] HOTELLING H. Analysis of a complex of statistical variables into principal components[J]. Journal of Education Psychology, 1933, 24(6): 417-441.
[6] KRUSKAL J B. Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis[J]. Psychometrika, 1964, 29(1): 1-27.
[7] ROWEIS S T, SAUL L K. Nonlinear dimensionality reduction by locally linear embedding[J]. Science, 2000, 290(5500): 2323-2326.
[8] HINTON G, ROWEIS S. Stochastic neighbor embedding[C]//NIPS 2003: Advances in Neural Information Processing Systems 16. Cambridge, MA: MIT Press, 2003: 833-840.
[9] MAATEN L V D, HINTON G. Visualizing data using t-SNE[J]. Journal of Machine Learning Research, 2008, 9(3): 2579-2605.
[10] TENENBAUM J B, DE SILVA V, LANGFORD J C. A global geometric framework for nonlinear dimensionality reduction[J]. Science, 2000, 290(5500): 2319-2323.
[11] BELKIN M, NIYOGI P. Laplacian eigenmaps and spectral techniques for embedding and clustering[C]//NIPS 2001: Advance in Neural Information Processing Systems 14. Cambridge, MA: MIT Press, 2001: 585-591.
[12] WEINBERGER K Q, SAUL L K. An introduction to nonlinear dimensionality reduction by maximum variance unfolding[C]//AAAI '06: Proceedings of the National Conference on Artificial Intelligence. Menlo Park, CA: AAAI Press, 2006: 1683-1686.
[13] LING H, WU L, CAI Y. Similarity measure in high dimensional space[J]. Practice and Cognition of Mathematics, 2006, 36(9):189-194.
[14] LICHMAN M. UCI machine learning repository [EB/OL]. [2014-06-05]. http://archive.Ics.uci.edu/ml/.
[15] The MNIST data set is available at [EB/OL]. [2014-04-03]. http://yann.lecun.com/exdb/mnist/index.html/.
[16] This data has been released by the Wireless Sensor Data Mining (WISDM) Lab [EB/OL]. [2014-04-05]. http://www.cis.fordham.edu/wisdm/.

Stochastic nonlinear dimensionality reduction based on nearest neighbors

基于最近邻的随机非线性降维

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

[1]	WANG Yue, JIANG Yiming, LAN Julong. Intrusion detection based on improved triplet network and K-nearest neighbor algorithm [J]. Journal of Computer Applications, 2021, 41(7): 1996-2002.
[2]	CAI Ruiguang, ZHANG Desheng, XIAO Yanting. Parameter independent weighted local mean-based pseudo nearest neighbor classification algorithm [J]. Journal of Computer Applications, 2021, 41(6): 1694-1700.
[3]	ZHANG Hao, ZHU Rui, SONG Fuyao, FANG Peng, XIA Xiufeng. Bichromatic reverse k nearest neighbor query method based on distance-keyword similarity constraint [J]. Journal of Computer Applications, 2021, 41(6): 1686-1693.
[4]	GUO Yicun, CHEN Huahui. Survey on online hashing algorithm [J]. Journal of Computer Applications, 2021, 41(4): 1106-1112.
[5]	CAO Yang, YAN Qiuyan, WU Xin. Ensemble classification algorithm for imbalanced time series [J]. Journal of Computer Applications, 2021, 41(3): 651-656.
[6]	Yang ZHANG, Xiaoning WANG. Text feature selection method based on Word2Vec word embedding and genetic algorithm for biomarker selection in high-dimensional omics [J]. Journal of Computer Applications, 2021, 41(11): 3151-3155.
[7]	LI Mingwei, JIANG Qingyuan, XIE Yinpeng, HE Jindong, WU Dan. Hash learning based malicious SQL detection [J]. Journal of Computer Applications, 2021, 41(1): 121-126.
[8]	JIN Bo, ZHANG Zhiyong, ZHAO Ting. Location nearest neighbor query method for social network based on differential privacy [J]. Journal of Computer Applications, 2020, 40(8): 2340-2344.
[9]	XIA Lunteng, ZHANG Li. Networked cane system for blind people based on K-nearest neighbor and dynamic time warping algorithms [J]. Journal of Computer Applications, 2020, 40(8): 2441-2448.
[10]	LI Dongbo, HUANG Lyuwen. Reweighted sparse principal component analysis algorithm and its application in face recognition [J]. Journal of Computer Applications, 2020, 40(3): 717-722.
[11]	WU Xiaoli, ZHENG Yifeng. Noise type recognition and intensity estimation based on K-nearest neighbors algorithm [J]. Journal of Computer Applications, 2020, 40(1): 264-270.
[12]	LI Zhen, YAO Hanbing, MU Yicheng. Secure ranked search scheme based on Simhash over encrypted data [J]. Journal of Computer Applications, 2019, 39(9): 2623-2628.
[13]	WANG Zhongzhen, HUANG Bo, FANG Zhijun, GAO Yongbin, ZHANG Juan. Improved SMOTE unbalanced data integration classification algorithm [J]. Journal of Computer Applications, 2019, 39(9): 2591-2596.
[14]	LI Bo, ZHANG Xiao, YAN Jingyi, LI Kewei, LI Heng, LING Yulong, ZHANG Yong. Application of KNN algorithm based on value difference metric and clustering optimization in bank customer behavior prediction [J]. Journal of Computer Applications, 2019, 39(9): 2784-2788.
[15]	ZHANG Chao, LI Ke, FAN Pingzhi. Online-hot video cache replacement policy based on cooperative small base stations and popularity prediction [J]. Journal of Computer Applications, 2019, 39(7): 2044-2050.