计算机应用 ›› 2005, Vol. 25 ›› Issue (08): 1824-1826.DOI: 10.3724/SP.J.1087.2005.01824

• 数据库与人工智能 • 上一篇    下一篇

一种基于密度的高效聚类算法

石陆魁1,2,何丕廉1   

  1. 1.天津大学计算机科学与技术系; 2.工业大学计算机科学与软件学院
  • 发布日期:2011-04-07 出版日期:2005-08-01
  • 基金资助:

    天津市科技发展计划资助项目(04310941R)

Efficient density-based clustering algorithm

SHI Lu-kui1,2,HE Pi-lian1   

  1. 1.Department of Computer Science and Technology,Tianjin University,Tianjin 300072,China; 2.School of Computer Science and Software,Hebei University of Technology,Tianjin 300130,China
  • Online:2011-04-07 Published:2005-08-01

摘要: 在聚类算法DBSCAN(DensityBasedSpatialClusteringofApplicationswithNoise)的基础上,提出了一种基于密度的高效聚类算法。该算法首先对样本集按某一维排序,然后通过在核心点的邻域外按顺序选择一个未标记的样本点来扩展种子点,以便减少查询次数,降低聚类的时间花费。对样本进行非线性核变换后再进行聚类可以有效地改善聚类的质量。理论分析表明,该算法的时间复杂性接近于线性复杂度。同时测试结果也表明新算法的时间复杂度和聚类质量都显著优于DBSCAN算法。

关键词: 聚类分析, DBSCAN, 核变换

Abstract: An efficient density-based clustering algorithm was presented based on DBSCAN(Density Based Spatial Clustering of Applications with Noise). In this method, objects were sorted by a certain dimensional coordinate at first. Then the new algorithm selected in order unlabelled points outside a core objects neighborhood as seeds to expand clusters so that the execution frequency of region queries could be decreased, and consequently the time cost was diminished. Transforming objects with a non-linear kernel function could effectively improve the clustering accuracy. The theoretic analysis demonstrates that the time complexity of this algorithm is approximately linear. Experimental results also show that the time efficiency and the clustering quality of the new algorithm are greatly superior to those of the original DBSCAN.

Key words: clustering analysis, DBSCAN, kernel transformation

中图分类号: