Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Distributed power iteration clustering based on GraphX
ZHAO Jun, XU Xiaoyan
Journal of Computer Applications    2016, 36 (10): 2710-2714.   DOI: 10.11772/j.issn.1001-9081.2016.10.2710
Abstract479)      PDF (706KB)(566)       Save
Concerning the cumbersome programming and low efficiency in parallel power iteration clustering algorithm, a new method for power iteration clustering in distributed environment was put forward based on Spark, a general computational engine for large-scale data processing, and its component GraphX. Firstly, the raw data was transformed into an affinity matrix which can be viewed as a graph by using some kind of similarity measure ment method. Secondly, by using vertex-cut technology, the row-normalized affinity matrix was divided into a number of subgraphs, which were stored on different machines of a cluster. Finally, using the in-memory computational framework Spark, several iterations were performed on the subgraphs stored in the cluster to get a cut of the original graph, and each subgraph of the original graph corresponded to a cluster. The experiments were carried out on datasets with different sizes and different number of executors. Experimental results show that the proposed distributed power iteration clustering algorithm has a good scalability, its running time is negatively correlated with the number of executors, the speedup of the algorithm ranges between 2.09 to 3.77 in a cluster of 6 executors compared with a single executor. Meanwhile, compared with the Hadoop-based power iteration clustering version, the running time of the proposed algorithm decreased significantly by 61% when dealing with 40000 pieces of news.
Reference | Related Articles | Metrics