High-dimensional data clustering algorithm with subspace optimization

doi:10.11772/j.issn.1001-9081.2014.08.2279

Journal of Computer Applications ›› 2014, Vol. 34 ›› Issue (8): 2279-2284.DOI: 10.11772/j.issn.1001-9081.2014.08.2279

• Artificial intelligence • Previous Articles Next Articles

High-dimensional data clustering algorithm with subspace optimization

WU Tao,CHEN Lifei,GUO Gongde

School of Mathematics and Computer Science, Fujian Normal University, Fuzhou Fujian 350007, China

Received:2014-01-06 Revised:2014-04-04 Online:2014-08-01 Published:2014-08-10
Contact: WU Tao

优化子空间的高维聚类算法

吴涛,陈黎飞,郭躬德

福建师范大学数学与计算机科学学院，福州350007

通讯作者: 吴涛
作者简介:吴涛(1990-)，男，福建龙岩人，硕士研究生，主要研究方向：数据挖掘；陈黎飞(1972-)，男，福建长乐人，副教授，博士，主要研究方向：数据挖掘、机器学习；郭躬德(1965-)，男，福建龙岩人，教授，博士，主要研究方向：人工智能、数据挖掘、机器学习。
基金资助:
国家自然科学基金资助项目;深圳市基础研究（重点）项目

Abstract

Abstract:

A new soft subspace clustering algorithm was proposed to address the optimization problem for the projected subspaces, which was generally not considered in most of the existing soft subspace clustering algorithms. Maximizing the deviation of feature weights was proposed as the sub-space optimization goal, and a quantitative formula was presented. Based on the above, a new optimization objective function was designed which aimed at minimizing the within-cluster compactness while optimizing the soft subspace associated with each cluster. A new expression for feature-weight computation was mathematically derived, with which the new clustering algorithm was defined based on the framework of the classical k-means. The experimental results show that the proposed method significantly reduces the probability of trapping in local optimum prematurely and improves the stability of clustering results. And it has good performance and clustering efficiency, which is suitable for high-dimensional data cluster analysis.

摘要：

针对当前大多数典型软子空间聚类算法未能考虑簇类投影子空间的优化问题，提出一种新的软子空间聚类算法。该算法将最大化权重之间的差异性作为子空间优化的目标，并提出了一个量化公式。以此为基础设计了一个新的优化目标函数，在最小化簇内紧凑度的同时，优化每个簇所在的软子空间。通过数学推导得到了新的特征权重计算方法，并基于k-means算法框架定义了新聚类算法。实验结果表明，所提算法对子空间的优化降低了算法过早陷入局部最优的可能性，提高了算法的稳定性，并且具有良好的性能和聚类效果，适合用于高维数据聚类分析。

CLC Number:

TP181

WU Tao CHEN Lifei GUO Gongde. High-dimensional data clustering algorithm with subspace optimization[J]. Journal of Computer Applications, 2014, 34(8): 2279-2284.

吴涛陈黎飞郭躬德. 优化子空间的高维聚类算法[J]. 计算机应用, 2014, 34(8): 2279-2284.

References

［1］HAN J, KAMBER M. Data mining: concepts and techniques ［M］. FAN M, MENG X, translated. 2nd ed. Beijing: China Machine Press, 2007.(HAN J, KAMBER M.数据挖掘:概念与技术［M］.范明,孟小峰,译.2版.北京:机械工业出版社,2007.)
［2］JAIN A, MURTY M, FLYNN P J. Data clustering: a review ［J］. ACM Computing Surveys, 1999, 31(3): 264-323.
［3］LEOPOLD E, KINDERMANN J. Text categorization with support vector machines: how to represent texts in input space? ［J］. Machine Learning, 2002, 46(1/2/3): 423-444.
［4］CHEN L. Research on clustering methods for high dimensional data and their applications ［D］. Xiamen: Xiamen University, 2008.(陈黎飞.高维数据的聚类方法研究与应用［D］.厦门：厦门大学,2008.)
［5］PARSONS L, HAQUE E, LIU H. Subspace clustering for high dimensional data: a review ［J］. ACM Knowledge Discovery and Data Mining Explorations Newsletter, 2004, 6(1): 90-105.
［6］VERLEYSEN M. Learning high-dimensional data ［C］// Proceedings of the Limitations and Future Trends in Neural Computation. Siena: IOS Press, 2003:141-162.
［7］YANG Q, WU X. 10 challenging problems in data mining research ［J］. International Journal of Information Technology and Decision Making, 2006, 5(4): 597-604.
［8］KRIEGEL H P, KRGER P, ZIMEK A. Clustering high-dimen-sional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering ［J］. ACM Transactions on Knowledge Discovery from Data, 2009, 3(1): 1-58.
［9］JING L, NG M K, HUANG J Z. An entropy weighting k-means algorithm for subspace clustering of high-dimensional sparse data ［J］. IEEE Transactions on Knowledge and Data Engineering, 2007, 19(8): 1026-1041.
［10］HUANG J Z, NG M K, RONG H, et al. Automated variable weighting in k-means type clustering ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(5): 657-668.
［11］DOMENICONI C, GUNOPULOS D, MA S, et al. Locally adaptive metrics for clustering high dimensional data ［J］. Data Mining and Knowledge Discovery, 2007, 14(1): 63-97.
［12］GAO G, WU J, YANG Z. A fuzzy subspace clustering algorithm for clustering high dimensional data ［C］// Proceedings of the Second International Conference on Advanced Data Mining and Applications. Berlin: Springer, 2006: 271-278.
［13］XU L, JORDAN M I. On convergence properties of the EM algorithm for Gaussian mixtures ［J］. Neural Computation, 1996, 8(1): 129-151.
［14］GULLO F, DOMENICONI C, TAGARELLI A. Projective clustering ensembles ［J］. Data Mining and Knowledge Discovery, 2013, 26(3): 452-511.
［15］CHEN L, GUO G, JIANG Q. An adaptive algorithm for soft subspace clustering ［J］. Journal of Software, 2010, 21(10): 2513-2523.(陈黎飞,郭躬德,姜青山.自适应的软子空间聚类算法［J］.软件学报,2010,21(10):2513-2523.)
［16］DENG Z, CHOI K S, CHUNG F L, et al. Enhanced soft subspace clustering integrating within-cluster and between-cluster information ［J］. Pattern Recognition, 2010, 43(3): 767-781.
［17］CHEN L, JIANG Q, WANG S. A probability model for projective clustering on high dimensional data ［C］// ICDM'08: Proceedings of the Eighth IEEE International Conference on Data Mining. Washington, DC: IEEE Computer Society, 2008: 755-760.
［18］XUE Y. Optimization theory and method ［M］. Beijing: Beijing University of Technology Press, 2001.(薛毅.最优化原理与方法［M］.北京:北京工业大学出版社,2001.)
［19］ZHAO Y, KARYPIS G. Comparison of agglomerative and partitional document clustering algorithms, TR 02-014 ［R］. Minneapolis: University of Minnesota, 2002.
［20］STREHL A, GHOSH J. Cluster ensembles—a knowledge reuse framework for combining multiple partitions ［J］. The Journal of Machine Learning Research, 2003, 3: 583-617.

[1]	. Partially explainable non-negative matrix tri-factorization based on prior knowledge [J]. Journal of Computer Applications, 0, (): 0-0.
[2]	. Stock trend prediction method based on temporal hypergraph convolutional neural networks [J]. Journal of Computer Applications, 0, (): 0-0.
[3]	. Adaptive artificial fish swarm algorithm utilizing gene exchange [J]. Journal of Computer Applications, 0, (): 0-0.
[4]	. Online kernel regression based on random sketching method [J]. Journal of Computer Applications, 0, (): 0-0.
[5]	. Genetic algorithm for approximate concept and its recommendation application [J]. Journal of Computer Applications, 0, (): 0-0.
[6]	. Multi-label active learning algorithm for shale gas reservoir prediction [J]. Journal of Computer Applications, 0, (): 0-0.
[7]	REN Kezhou, PENG Furong, GUO Xin, WANG Zhe, ZHANG Xiaojing. Social recommendation based on dynamic integration of social information [J]. Journal of Computer Applications, 2021, 41(10): 2806-2812.
[8]	ZHANG Zhihao, LIN Yaojin, LU Shun, GUO Chen, WANG Chenxi. Multi-label feature selection based on label-specific feature with missing labels [J]. Journal of Computer Applications, 2021, 41(10): 2849-2857.
[9]	WANG Yahui, QIAN Yuhua, LIU Guoqing. Ordinal decision tree algorithm based on fuzzy advantage complementary mutual information [J]. Journal of Computer Applications, 2021, 41(10): 2785-2792.
[10]	. Long and short- term recommendation model based on knowledge graph preference attention network and its updating method [J]. Journal of Computer Applications, 0, (): 0-0.
[11]	. Spatial-temporal prediction model of urban short-term traffic flow based on grid division [J]. Journal of Computer Applications, 0, (): 0-0.
[12]	. Capsule network knowledge graph embedding model based on relational memory [J]. Journal of Computer Applications, 0, (): 0-0.
[13]	ZHANG Cheng, WAN Yuan, QIANG Haopeng. Deep unsupervised discrete cross-modal hashing based on knowledge distillation [J]. Journal of Computer Applications, 2021, 41(9): 2523-2531.
[14]	SUN Haoyi, WANG Chuanmei, DING Yiming. Extreme learning machine optimization based on hidden layer output matrix [J]. Journal of Computer Applications, 2021, 41(9): 2481-2488.
[15]	BIAN Lingzhi, WANG Zhijie. Credit scoring model based on enhanced multi-dimensional and multi-grained cascade forest [J]. Journal of Computer Applications, 2021, 41(9): 2539-2544.

High-dimensional data clustering algorithm with subspace optimization

优化子空间的高维聚类算法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics