基于局部密度的快速离群点检测算法

doi:10.11772/j.issn.1001-9081.2017.10.2932

计算机应用 ›› 2017, Vol. 37 ›› Issue (10): 2932-2937.DOI: 10.11772/j.issn.1001-9081.2017.10.2932

基于局部密度的快速离群点检测算法

邹云峰¹, 张昕¹, 宋世渊², 倪巍伟²

1. 国网江苏省电力公司电力科学研究院, 南京 210036;
2. 东南大学计算机科学与工程学院, 南京 211189

收稿日期:2017-04-12 修回日期:2017-07-02 出版日期:2017-10-10 发布日期:2017-10-16
通讯作者: 宋世渊(1992-),男,河南平顶山人,硕士研究生,主要研究方向:数据挖掘、数据隐私保护,E-mail:2293515844@qq.com
作者简介:邹云峰(1977-),男,江西丰城人,高级工程师,主要研究方向:数据挖掘、电力信息系统;张昕(1987-),男,江苏南京人,硕士,主要研究方向:数据集成、电力信息系统;宋世渊(1992-),男,河南平顶山人,硕士研究生,主要研究方向:数据挖掘、数据隐私保护;倪巍伟(1979-),男,江苏淮阴人,教授,博士生导师,博士,主要研究方向:数据挖掘、数据隐私保护、复杂数据管理.
基金资助:
国家自然科学基金资助项目（61370077）。

Fast outlier detection algorithm based on local density

ZOU Yunfeng¹, ZHANG Xin¹, SONG Shiyuan², NI Weiwei²

1. State Grid, Jiangsu Electronic Power Research Institute, Nanjing Jiangsu 210036, China;
2. School of Computer Science and Engineering, Southeast University, Nanjing Jiangsu 211189, China

Received:2017-04-12 Revised:2017-07-02 Online:2017-10-10 Published:2017-10-16
Supported by:
This work is partially supported by the National Natural Science Foundation of China (61370077).

摘要/Abstract

摘要： 已有的密度离群点检测算法LOF不能适应数据分布异常情况离群点检测，INFLO算法虽引入反向k近邻点集有效地解决了数据分布异常情况的离群点检测问题，但存在需要对所有数据点不加区分地分析其k近邻和反向k近邻点集导致的效率降低问题。针对该问题，提出局部密度离群点检测算法--LDBO，引入强k近邻点和弱k近邻点概念，通过分析邻近数据点的离群相关性，对数据点区别对待；并提出数据点离群性预判断策略，尽可能避免不必要的反向k近邻分析，有效提高数据分布异常情况离群点检测算法的效率。理论分析和实验结果表明，LDBO算法效率优于INFLO，算法是有效可行的。

关键词: 离群点检测, 局部密度, 强k近邻点, 弱k近邻点, 反向k近邻点集

Abstract: Mining outliers is to find exceptional objects that deviate from the most rest of the data set. Outlier detection based on density has attracted lots of attention, but the density-based algorithm named Local Outlier Factor (LOF) is not suitable for the data set with abnormal distribution, and the algorithm named INFLuenced Outlierness (INFLO) solves this problem by analyzing both k nearest neighbors and reverse k nearest neighbors of each data point at cost of inferior efficiency. To solve this problem, a local density-based algorithm named Local Density Based Outlier detection (LDBO) was proposed, which can improve outlier detection efficiency and effectiveness simultaneously. LDBO introduced definitions of strong k nearest neighbors and weak k nearest neighbors to realize outlier relation analysis of those data points located nearby. Furthermore, to improve the outlier detection efficiency, prejudgement was applied to avoid unnecessary reverse k nearest neighbor analysis as far as possible. Theoretical analysis and experimental results Indicate that LDBO outperforms INFLO in efficiency, and it is effective and feasible.

Key words: outlier detection, local density, strong k nearest neighbors, weak k nearest neighbors, Reverse k Nearest Neighbors (RkNN)

中图分类号:

TP274

邹云峰, 张昕, 宋世渊, 倪巍伟. 基于局部密度的快速离群点检测算法[J]. 计算机应用, 2017, 37(10): 2932-2937.

ZOU Yunfeng, ZHANG Xin, SONG Shiyuan, NI Weiwei. Fast outlier detection algorithm based on local density[J]. Journal of Computer Applications, 2017, 37(10): 2932-2937.

参考文献

[1] HAWKINS D. Identification of Outliers[M]. London: Chapman and Hall, 1980: 1-45.
[2] JOHNSON T, KWOK I, NG R. Fast computation of 2-dimensional depth contours[C]//KDD 1998: Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining. Menlo Park: AAAI Press, 1998: 224-228.
[3] KNORR E M, NG R T. Algorithms for mining distance-based outliers in large datasets[C]//VLDB 1998: Proceedings of the 24rd International Conference on Very Large Data Bases. San Francisco: Morgan Kaufmann Publishers, 1998: 392-403.
[4] BREUNIG M M, KRIEGEL H-P, NG R T, et al. LOF: identifying density-based local outliers[J]. ACM SIGMOD Record, 2000, 29(2): 93-104.
[5] PAPADIMITRIOU S, KITAGAWA H, GIBBONS P B. LOCI: fast outlier detection using the local correlation integral[C]//Proceedings of the 19th International Conference on Data Engineering. Piscataway, NJ: IEEE, 2004: 315-326.
[6] AGGARWAL C, YU P. Outlier detection for high dimensional data[J]. ACM SIGMOD Record, 2001, 30(2): 37-46.
[7] 倪巍伟, 陈耿, 陆介平, 等. 基于局部信息熵的加权子空间离群点检测算法[J]. 计算机研究与发展, 2008, 45(7): 1189-1194. (NI W W, CHEN G, LU J P, et al. Local entropy based weighted subspace outlier mining algorithm[J]. Journal of Computer Research and Development, 2008, 45(7): 1189-1194.)
[8] 刘露, 左万利, 彭涛.异质网中基于张量表示的动态离群点检测方法[J]. 计算机研究与发展, 2016, 53(8): 1729-1739. (LIU L, ZUO W L, PENG T. Tensor representation based dynamic outlier detection method in heterogeneous network[J]. Journal of Computer Research and Development, 2016, 53(8): 1729-1739.)
[9] 黄添强, 余养强, 郭躬德, 等.半监督的移动对象离群轨迹检测算法[J]. 计算机研究与发展, 2011, 48(11): 2074-2082. (HUANG T Q, YU Y Q, GUO G D, et al. Trajectory outlier detection based on semi-supervised technology[J]. Journal of Computer Research and Development, 2011, 48(11): 2074-2082.)
[10] 胡彩平, 秦小麟. 一种基于密度的局部离群点检测算法DLOF[J]. 计算机研究与发展, 2010, 47(12): 2110-2116. (HU C P, QIN X L. A density-based local outlier detecting algorithm [J]. Journal o f Computer Research and Development, 2010, 47(12): 2110-2116.)
[11] JIN W, TUNG A K H, HAN J, et al. Ranking outliers using symmetric neighborhood relationship[C]//Proceedings of the 10th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining. Berlin: Springer, 2006: 577-593.
[12] RADOVANOVIC M, NANOPOULOS A, IVANOVIC M. Reverse nearest neighbors in unsupervised distance-based outlier detection[J]. IEEE Transactions on Knowledge & Data Engineering, 2014, 27(5): 1369-1382.
[13] 杨慧, 王丽婧. 基于聚类和拟合的QAR数据离群点检测算法[J]. 计算机工程与设计, 2015, 36(1): 174-177. (YANG H, WANG L J.QAR data outlier detection algorithm based on clustering and fitting[J]. Computer Engineering and Design, 2015, 36(1): 174-177.)

基于局部密度的快速离群点检测算法

Fast outlier detection algorithm based on local density

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 11

编辑推荐

Metrics

[1]	宁进, 陈雷霆, 罗子娟, 周川, 曾慧茹. 离群点检测算法的评价指标[J]. 计算机应用, 2020, 40(9): 2622-2627.
[2]	孙建军, 徐岩. 基于加权改进模糊C均值聚类的欠定混合矩阵估计[J]. 计算机应用, 2020, 40(6): 1769-1773.
[3]	杜旭升, 于炯, 叶乐乐, 陈嘉颖. 基于图上随机游走的离群点检测算法[J]. 计算机应用, 2020, 40(5): 1322-1328.
[4]	黄功, 赵永平, 谢云龙. 基于局部密度的加权一类支持向量机算法及其在涡轴发动机故障检测中的应用[J]. 计算机应用, 2020, 40(3): 917-924.
[5]	宁进, 陈雷霆, 周川, 张磊. 模型聚合解聚的智能触发机制[J]. 计算机应用, 2019, 39(6): 1614-1618.
[6]	袁钟, 冯山. 基于邻域值差异度量的离群点检测算法[J]. 计算机应用, 2018, 38(7): 1905-1909.
[7]	鲍舒婷, 孙丽萍, 郑孝遥, 郭良敏. 基于共享近邻相似度的密度峰聚类算法[J]. 计算机应用, 2018, 38(6): 1601-1607.
[8]	严宏, 杨波, 杨红雨. 基于异方差高斯过程的时间序列数据离群点检测[J]. 计算机应用, 2018, 38(5): 1346-1352.
[9]	邱保志, 唐雅敏. 快速识别密度骨架的聚类算法[J]. 计算机应用, 2017, 37(12): 3482-3486.
[10]	胡云施珺王崇骏李慧. 基于全局最近邻的离群点检测算法[J]. 计算机应用, 2011, 31(10): 2778-2781.
[11]	孙浩何晓红. 动态数据环境下基于信息熵的相对离群点检测算法[J]. 计算机应用, 2010, 30(05): 1284-1286.