• •    

基于MapReduce的密度聚类改进算法

邱宁佳1,李宾1,王鹏2,杨华民1,王玮琦3   

  1. 1. 长春理工大学
    2. 长春理工大学南区 科技大厦A座1113室
    3. 哈尔滨工程大学
  • 收稿日期:2016-11-16 修回日期:2016-12-20 发布日期:2016-12-20
  • 通讯作者: 邱宁佳

Research on a New Density Clustering Algorithm Based on MapReduce

  • Received:2016-11-16 Revised:2016-12-20 Online:2016-12-20

摘要: 针对DBSCAN算法经验化求解参数和执行效率低的问题,提出了一种基于遗传算法的自适应DBSCAN算法,根据数据集的相似性和差异性对其进行两次规约处理,将数据合理的序列化。针对聚类质量和算法效率问题,通过遗传算法合理规划密集区间阈值minPts、扫描半径Eps大小,并结合MapReduce并行计算编程框架,在Hadoop集群上利用所求得阈值实现并行化聚类。实验结果表明,改进后的算法(GA-DBSCANMR)的执行效率和聚类质量都高于原DBSCAN算法。

关键词: DBSCAN, minPts, Eps, 遗传算法, MapReduce

Abstract: Abstract: To improve the clustering quality and algorithm efficiency,the paper proposes an adaptive DBSCAN algorithm based on genetic algorithm.It can reasonably ensure data serialization by analyzing the similarity and dissimilarity in date set.Firstly we set minPts—the threshold of condensed space,Eps—the size of scanning radius, and parallel programming framework combined with MapReduce,then use the obtained threshold to achieve parallel clustering in a Hadoop cluster. Finally, the experimental results show the improved algorithm (GA-DBSCANMR) has better performance than the original DBSCAN algorithm.

Key words: Keywords: DBSCAN, minPts, Eps, Algorithm of Genetic, MapReduce

中图分类号: