计算机应用

• 数据库技术 • 上一篇    下一篇

一种基于距离的再聚类的离群数据发现算法

徐雪松   

  1. 南京理工大学 计算机科学与工程学院
  • 收稿日期:2006-04-24 修回日期:2006-06-15 发布日期:2006-10-01 出版日期:2006-10-01
  • 通讯作者: 徐雪松

Algorithm of finding Outlier for reclustering based on distance

XueSong Xu   

  • Received:2006-04-24 Revised:2006-06-15 Online:2006-10-01 Published:2006-10-01
  • Contact: XueSong Xu

摘要: 通过研究基于离群距离的数据发现(Cell-Based)算法的识别、分析和评价算法,指出了其优越性和不足,提出一种新的离群数据发现算法——基于距离的再聚类离群数据发现算法。理论分析和仿真结果表明,该算法有效地克服了传统的基于距离的数据发现算法易于随参数变化而需要调整单元结构,以及只适用于维度不高的离群数据发现等的缺点,并有效地避免了由于随机初始值选取导致不同的离群数据发现结果问题,同时也有较快的收敛速度。

关键词: 聚类, 距离, 离群数据

Abstract: The identifying, analyzing and evaluating algorithm of finding distance-based Outlier (Cell-Based) was firstly studied, and its advantages and disadvantages were pointed out. And then, a new Outlier finding algorithm-algorithm of finding Outlier for reclustering based on distance was proposed. Theoretical analysis and experimental results show that this algorithm can not only effectively overcome the faults of traditional Cell-Based algorithm, i.e. need to be recomputed from scratch for every change of the parameters, and only suitable for finding the Outlier of low dimension, but also obviously avoid the problems caused by randomly selecting initial value to produce different finding results of Outlier at higher convergence speed.

Key words: cluster, distance, Outlier