计算机应用 ›› 2016, Vol. 36 ›› Issue (2): 372-376.DOI: 10.11772/j.issn.1001-9081.2016.02.0372

• 第三届CCF大数据学术会议(CCF BigData 2015) • 上一篇    下一篇

利用坐标下降实现并行稀疏子空间聚类

吴杰祺, 李晓宇, 袁晓彤, 刘青山   

  1. 南京信息工程大学 江苏省大数据分析技术重点实验室, 南京 210044
  • 收稿日期:2015-08-29 修回日期:2015-09-17 出版日期:2016-02-10 发布日期:2016-02-03
  • 通讯作者: 刘青山(1975-),男,安徽庐江人,教授,博士,主要研究方向:图像分析、模式识别。
  • 作者简介:吴杰祺(1992-),男,江西宜春人,硕士研究生,主要研究方向:并行计算、机器学习;李晓宇(1991-),男,辽宁锦州人,硕士研究生,主要研究方向:大数据处理、并行计算;袁晓彤(1980-),男,江苏南通人,教授,博士,主要研究方向:稀疏学习、图模型、子空间分析
  • 基金资助:
    国家自然科学基金资助项目(61402232,61532009,61522308);江苏省自然科学基金资助项目(BK20141003,BK2012045)。

Parallel sparse subspace clustering via coordinate descent minimization

WU Jieqi, LI Xiaoyu, YUAN Xiaotong, LIU Qingshan   

  1. Jiangsu Key Laboratory of Big Data Analysis Technology, Nanjing University of Information Science & Technology, Nanjing Jiangsu 210044, China
  • Received:2015-08-29 Revised:2015-09-17 Online:2016-02-10 Published:2016-02-03

摘要: 随着数据规模的不断扩大,稀疏子空间聚类问题面临计算上的巨大挑战。现有稀疏子空间聚类算法如交替方向乘子法(ADMM)往往基于串行实现,难以利用多核处理器提高处理大规模聚类问题的效率。针对这个问题,提出一种基于坐标下降的并行稀疏子空间聚类方法。该方法利用稀疏子空间聚类可以建模为求解一系列的样本自稀疏表达子问题的特点,使用坐标下降方法来求解每个子问题,具有参数少、收敛快的优点;同时结合自稀疏表达子问题独立的特点,在处理器的各个核心上同时求解不同样本对应的子问题,因此可以充分利用计算机资源,减少运行时间开销。在模拟数据和运动分割数据集Hopkins-155上与常用的ADMM算法进行对比实验,结果表明该算法在多核处理器上可以显著提升运行速度且聚类精度与ADMM相当。

关键词: 稀疏子空间聚类, 高维, 坐标下降, 并行优化, 运动分割

Abstract: Since the rapidly increasing data scale imposes a great computational challenge to the problem of Sparse Subspace Clustering (SSC), the existing optimization algorithms e.g. ADMM (Alternating Direction Method of Multipliers) for SSC are implemented in a sequential way which is unable to make use of multi-core processors to improve computational efficiency. To address this issue, a parallel SSC based on coordinate descent was proposed,inspired by a simple observation that the SSC can be formulated as a sequence of sample based sparse self-expression sub-problems. The proposed algorithm solves individual sub-problems by using a coordinate descent algorithm with fewer parameters and fast convergence. Based on the fact that the self-expression sub-problems are independent, a strategy was adopted to solve these sub-problems simultaneously on different processor cores, which brings the benefits of low computer resource consumption and fast running speed, it means that that the proposed algorithm is suitable for large scale clustering. Experiments on simulated data and Hopkins-155 motion segmentation dataset demonstrate that the proposed parallel SSC method on multi-core processors significantly improves the computational efficiency and ensures the accuracy when compared with ADMM.

Key words: Sparse Subspace Clustering(SSC), high dimensionality, coordinate descent, parallel optimization, motion segmentation

中图分类号: