计算机应用 ›› 2017, Vol. 37 ›› Issue (10): 2932-2937.DOI: 10.11772/j.issn.1001-9081.2017.10.2932

• 数据科学与技术 • 上一篇    下一篇

基于局部密度的快速离群点检测算法

邹云峰1, 张昕1, 宋世渊2, 倪巍伟2   

  1. 1. 国网江苏省电力公司 电力科学研究院, 南京 210036;
    2. 东南大学 计算机科学与工程学院, 南京 211189
  • 收稿日期:2017-04-12 修回日期:2017-07-02 出版日期:2017-10-10 发布日期:2017-10-16
  • 通讯作者: 宋世渊(1992-),男,河南平顶山人,硕士研究生,主要研究方向:数据挖掘、数据隐私保护,E-mail:2293515844@qq.com
  • 作者简介:邹云峰(1977-),男,江西丰城人,高级工程师,主要研究方向:数据挖掘、电力信息系统;张昕(1987-),男,江苏南京人,硕士,主要研究方向:数据集成、电力信息系统;宋世渊(1992-),男,河南平顶山人,硕士研究生,主要研究方向:数据挖掘、数据隐私保护;倪巍伟(1979-),男,江苏淮阴人,教授,博士生导师,博士,主要研究方向:数据挖掘、数据隐私保护、复杂数据管理.
  • 基金资助:
    国家自然科学基金资助项目(61370077)。

Fast outlier detection algorithm based on local density

ZOU Yunfeng1, ZHANG Xin1, SONG Shiyuan2, NI Weiwei2   

  1. 1. State Grid, Jiangsu Electronic Power Research Institute, Nanjing Jiangsu 210036, China;
    2. School of Computer Science and Engineering, Southeast University, Nanjing Jiangsu 211189, China
  • Received:2017-04-12 Revised:2017-07-02 Online:2017-10-10 Published:2017-10-16
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61370077).

摘要: 已有的密度离群点检测算法LOF不能适应数据分布异常情况离群点检测,INFLO算法虽引入反向k近邻点集有效地解决了数据分布异常情况的离群点检测问题,但存在需要对所有数据点不加区分地分析其k近邻和反向k近邻点集导致的效率降低问题。针对该问题,提出局部密度离群点检测算法--LDBO,引入强k近邻点和弱k近邻点概念,通过分析邻近数据点的离群相关性,对数据点区别对待;并提出数据点离群性预判断策略,尽可能避免不必要的反向k近邻分析,有效提高数据分布异常情况离群点检测算法的效率。理论分析和实验结果表明,LDBO算法效率优于INFLO,算法是有效可行的。

关键词: 离群点检测, 局部密度, k近邻点, 弱k近邻点, 反向k近邻点集

Abstract: Mining outliers is to find exceptional objects that deviate from the most rest of the data set. Outlier detection based on density has attracted lots of attention, but the density-based algorithm named Local Outlier Factor (LOF) is not suitable for the data set with abnormal distribution, and the algorithm named INFLuenced Outlierness (INFLO) solves this problem by analyzing both k nearest neighbors and reverse k nearest neighbors of each data point at cost of inferior efficiency. To solve this problem, a local density-based algorithm named Local Density Based Outlier detection (LDBO) was proposed, which can improve outlier detection efficiency and effectiveness simultaneously. LDBO introduced definitions of strong k nearest neighbors and weak k nearest neighbors to realize outlier relation analysis of those data points located nearby. Furthermore, to improve the outlier detection efficiency, prejudgement was applied to avoid unnecessary reverse k nearest neighbor analysis as far as possible. Theoretical analysis and experimental results Indicate that LDBO outperforms INFLO in efficiency, and it is effective and feasible.

Key words: outlier detection, local density, strong k nearest neighbors, weak k nearest neighbors, Reverse k Nearest Neighbors (RkNN)

中图分类号: