Journal of Computer Applications ›› 2017, Vol. 37 ›› Issue (10): 2932-2937.DOI: 10.11772/j.issn.1001-9081.2017.10.2932

Previous Articles     Next Articles

Fast outlier detection algorithm based on local density

ZOU Yunfeng1, ZHANG Xin1, SONG Shiyuan2, NI Weiwei2   

  1. 1. State Grid, Jiangsu Electronic Power Research Institute, Nanjing Jiangsu 210036, China;
    2. School of Computer Science and Engineering, Southeast University, Nanjing Jiangsu 211189, China
  • Received:2017-04-12 Revised:2017-07-02 Online:2017-10-10 Published:2017-10-16
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61370077).

基于局部密度的快速离群点检测算法

邹云峰1, 张昕1, 宋世渊2, 倪巍伟2   

  1. 1. 国网江苏省电力公司 电力科学研究院, 南京 210036;
    2. 东南大学 计算机科学与工程学院, 南京 211189
  • 通讯作者: 宋世渊(1992-),男,河南平顶山人,硕士研究生,主要研究方向:数据挖掘、数据隐私保护,E-mail:2293515844@qq.com
  • 作者简介:邹云峰(1977-),男,江西丰城人,高级工程师,主要研究方向:数据挖掘、电力信息系统;张昕(1987-),男,江苏南京人,硕士,主要研究方向:数据集成、电力信息系统;宋世渊(1992-),男,河南平顶山人,硕士研究生,主要研究方向:数据挖掘、数据隐私保护;倪巍伟(1979-),男,江苏淮阴人,教授,博士生导师,博士,主要研究方向:数据挖掘、数据隐私保护、复杂数据管理.
  • 基金资助:
    国家自然科学基金资助项目(61370077)。

Abstract: Mining outliers is to find exceptional objects that deviate from the most rest of the data set. Outlier detection based on density has attracted lots of attention, but the density-based algorithm named Local Outlier Factor (LOF) is not suitable for the data set with abnormal distribution, and the algorithm named INFLuenced Outlierness (INFLO) solves this problem by analyzing both k nearest neighbors and reverse k nearest neighbors of each data point at cost of inferior efficiency. To solve this problem, a local density-based algorithm named Local Density Based Outlier detection (LDBO) was proposed, which can improve outlier detection efficiency and effectiveness simultaneously. LDBO introduced definitions of strong k nearest neighbors and weak k nearest neighbors to realize outlier relation analysis of those data points located nearby. Furthermore, to improve the outlier detection efficiency, prejudgement was applied to avoid unnecessary reverse k nearest neighbor analysis as far as possible. Theoretical analysis and experimental results Indicate that LDBO outperforms INFLO in efficiency, and it is effective and feasible.

Key words: outlier detection, local density, strong k nearest neighbors, weak k nearest neighbors, Reverse k Nearest Neighbors (RkNN)

摘要: 已有的密度离群点检测算法LOF不能适应数据分布异常情况离群点检测,INFLO算法虽引入反向k近邻点集有效地解决了数据分布异常情况的离群点检测问题,但存在需要对所有数据点不加区分地分析其k近邻和反向k近邻点集导致的效率降低问题。针对该问题,提出局部密度离群点检测算法--LDBO,引入强k近邻点和弱k近邻点概念,通过分析邻近数据点的离群相关性,对数据点区别对待;并提出数据点离群性预判断策略,尽可能避免不必要的反向k近邻分析,有效提高数据分布异常情况离群点检测算法的效率。理论分析和实验结果表明,LDBO算法效率优于INFLO,算法是有效可行的。

关键词: 离群点检测, 局部密度, k近邻点, 弱k近邻点, 反向k近邻点集

CLC Number: