• •    

一种基于局部密度的快速离群点检测算法

邹云峰,张昕,宋世渊,倪巍伟   

  1. 东南大学
  • 收稿日期:2017-04-12 修回日期:2017-06-15 发布日期:2017-06-15
  • 通讯作者: 宋世渊

A Fast Outlier Detection Algorithm Based on Local Density

  • Received:2017-04-12 Revised:2017-06-15 Online:2017-06-15

摘要: 离群点检测作为数据挖掘的一个重要研究方向,可以从大量数据中发现少量与多数数据有明显区别的数据对象。基于密度的离群点检测作为代表性检测技术得到了持续关注,已有的密度离群点检测算法LOF不能适应数据分布异常情况离群点检测,INFLO算法引入反向k近邻点集有效地解决了数据分布异常情况的离群点检测问题,但存在需要对所有数据点不加区分的分析其k近邻和反向k近邻点集,效率较低,对大数据集适应性差的不足。针对这些问题,提出局部密度离群点检测算法LDBO,算法引入强k近邻点和弱k近邻点等概念,对数据集中的数据点进行区别对待,并对数据点的离群性进行预判断,有效的提高了数据分布异常情况离群点检测算法的效率,理论分析和实验结果表明,算法效率优于INFLO,算法是有效可行的。

关键词: 离群点检测, 局部密度, 强k近邻点, 弱k近邻点, 反向k近邻点集

Abstract: Abstract: Mining Outliers is to find exceptional objects that deviates from the most rest of the data set.Outlier detection based on density has attracted lots of attention, existed density-based algorithm LOF is not suitable for data set which is distributed abnormally. Algorithm INFLO solves this problem by analyzing both k nearest neighbor and reverse k nearest neighbor of each data point at the cost of inferior efficiency. Local density-based algorithm LDBO is proposed, which can solve these problems simultaneously and efficiently by adopting vigorous k neighbor point and weak k neighbor point to deal with each data point distinguishingly. Furthermore, to advancing efficiency, prejudgement has applied to data points. Theoretical analysis and experimental results testify that LDBO outperform algorithm INFLO in efficiency, and algorithm LDBO is efficiently and effectively.

Key words: outlier detection, local density, vigorous k neighbor point, weak k neighbor point, reverse k nearest neighbors

中图分类号: