%0 Journal Article %A FAN Zhongxin %A MIAO Chunsheng %A WANG Xing %T Improved BIRCH clustering algorithm based on connectivity distance and intensity %D 2019 %R 10.11772/j.issn.1001-9081.2018081790 %J Journal of Computer Applications %P 1027-1031 %V 39 %N 4 %X Focusing on the issues that clustering results of Balanced Iterative Reducing and Clustering using Hierarchies (BIRCH) depend on the adding order of data objects, BIRCH has poor clustering effect on non-convex clusters, and each cluster of BIRCH can only contain a similar number of data objects because of the cluster diameter threshold, an improved BIRCH algorithm was proposed. In this algorithm, the cluster diameter threshold was replaced by connectivity distance and intensity threshold which described the connectivity between the data objects, and cluster merging step was added into the generation of cluster feature tree. Experimental result on custom and iris, wine, pendigits datasets show that the proposed algorithm has higher clustering accuracy than the existing improved algorithms such as multi-threshold BIRCH and density-improved BIRCH; especially on large datasets, the proposed algorithm has accuracy increased by 6 percentage points and running time reduced by 61% compared to density-improved BIRCH. The proposed algorithm can be applied to online real-time incremental data processing and identify non-convex clusters and clusters with uneven volume, has denoising function and significantly reduces time-complexity and space-complexity. %U http://www.joca.cn/EN/10.11772/j.issn.1001-9081.2018081790