Journal of Computer Applications ›› 2017, Vol. 37 ›› Issue (10): 2819-2822.DOI: 10.11772/j.issn.1001-9081.2017.10.2819

Previous Articles     Next Articles

Probabilistic distribution model based on Wasserstein distance for nonlinear dimensionality reduction

CAO Xiaolu, XIN Yunhong   

  1. School of Physics and Information Technology, Shaanxi Normal University, Xi'an Shaanxi 710119, China
  • Received:2017-04-18 Revised:2017-07-04 Online:2017-10-10 Published:2017-10-16
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (11374199, 11574192).

基于Wasserstein距离概率分布模型的非线性降维

曹小鹿, 辛云宏   

  1. 陕西师范大学 物理学与信息技术学院, 西安 710119
  • 通讯作者: 辛云宏(1967-),男,陕西西安人,教授,博士,主要研究方向:主要研究方向:数据降维与可视化、微弱光电检测与处理、被动目标定位与跟踪、多传感器信息融合,E-mail:xinyh@snnu.edu.cn
  • 作者简介:曹小鹿(1993-),女,陕西西安人,硕士研究生,主要研究方向:数据降维与可视化、机器学习;辛云宏(1967-),男,陕西西安人,教授,博士,主要研究方向:主要研究方向:数据降维与可视化、微弱光电检测与处理、被动目标定位与跟踪、多传感器信息融合.
  • 基金资助:
    国家自然科学基金资助项目(11374199,11574192)。

Abstract: Dimensionality reduction plays an important role in big data analysis and visualization. Many dimensionality reduction techniques with probabilistic distribution models rely on the optimizaition of cost function between low-dimensional model distribution and high-dimensional real distribution. The key issue of this type of technology is to efficiently construct the probabilistic distribution model representing the feature of original high-dimensional dataset most. In this paper, Wasserstein distance was introduced to dimensionality reduction, and a novel method named Wasserstein Embedded Map (W-map) was presented for high-dimensional data reduction and visualization. W-map converts dimensionality reduction problem into optimal transportation problem by constructing the similar Wasserstein flow in the high-dimensional dataset and its corresponding low-dimensional representation, and then the best matched low-dimensional visualization was found by solving the optimal transportation problem of Wasserstein distance. Experimental results demonstrate that the presented method performs well in dimensionality reduction and visualization for high-dimensional data.

Key words: dimensionality reduction, Wasserstein distance, optimal transportation problem, nonlinear technique, probabilistic model

摘要: 降维是大数据分析和可视化领域中的核心问题,其中基于概率分布模型的降维算法通过最优化高维数据模型和低维数据模型之间的代价函数来实现降维。这种策略的核心在于构建最能体现数据特征的概率分布模型。基于此,将Wasserstein距离引入降维,提出一个基于Wasserstein距离概率分布模型的非线性降维算法W-map。W-map模型在高维数据空间和其相关对应的低维数据空间建立相似的Wasserstein流,将降维转化为最小运输问题。在解决Wasserstein距离最小化的问题同时,依据数据的Wasserstein流模型在高维空间与其在低维空间相同的原则,寻找最匹配的低维数据投射。三组针对不同数据集的实验结果表明W-map相对传统概率分布模型可以产生正确性高且鲁棒性好的高维数据降维可视化结果。

关键词: 降维, Wasserstein距离, 最小运输问题, 非线性方法, 概率分布模型

CLC Number: