Fast outlier detection algorithm based on local density

doi:10.11772/j.issn.1001-9081.2017.10.2932

Journal of Computer Applications ›› 2017, Vol. 37 ›› Issue (10): 2932-2937.DOI: 10.11772/j.issn.1001-9081.2017.10.2932

Previous Articles Next Articles

Fast outlier detection algorithm based on local density

ZOU Yunfeng¹, ZHANG Xin¹, SONG Shiyuan², NI Weiwei²

1. State Grid, Jiangsu Electronic Power Research Institute, Nanjing Jiangsu 210036, China;
2. School of Computer Science and Engineering, Southeast University, Nanjing Jiangsu 211189, China

Received:2017-04-12 Revised:2017-07-02 Online:2017-10-10 Published:2017-10-16
Supported by:
This work is partially supported by the National Natural Science Foundation of China (61370077).

基于局部密度的快速离群点检测算法

邹云峰¹, 张昕¹, 宋世渊², 倪巍伟²

1. 国网江苏省电力公司电力科学研究院, 南京 210036;
2. 东南大学计算机科学与工程学院, 南京 211189

通讯作者: 宋世渊(1992-),男,河南平顶山人,硕士研究生,主要研究方向:数据挖掘、数据隐私保护,E-mail:2293515844@qq.com
作者简介:邹云峰(1977-),男,江西丰城人,高级工程师,主要研究方向:数据挖掘、电力信息系统;张昕(1987-),男,江苏南京人,硕士,主要研究方向:数据集成、电力信息系统;宋世渊(1992-),男,河南平顶山人,硕士研究生,主要研究方向:数据挖掘、数据隐私保护;倪巍伟(1979-),男,江苏淮阴人,教授,博士生导师,博士,主要研究方向:数据挖掘、数据隐私保护、复杂数据管理.
基金资助:
国家自然科学基金资助项目（61370077）。

Abstract

Abstract: Mining outliers is to find exceptional objects that deviate from the most rest of the data set. Outlier detection based on density has attracted lots of attention, but the density-based algorithm named Local Outlier Factor (LOF) is not suitable for the data set with abnormal distribution, and the algorithm named INFLuenced Outlierness (INFLO) solves this problem by analyzing both k nearest neighbors and reverse k nearest neighbors of each data point at cost of inferior efficiency. To solve this problem, a local density-based algorithm named Local Density Based Outlier detection (LDBO) was proposed, which can improve outlier detection efficiency and effectiveness simultaneously. LDBO introduced definitions of strong k nearest neighbors and weak k nearest neighbors to realize outlier relation analysis of those data points located nearby. Furthermore, to improve the outlier detection efficiency, prejudgement was applied to avoid unnecessary reverse k nearest neighbor analysis as far as possible. Theoretical analysis and experimental results Indicate that LDBO outperforms INFLO in efficiency, and it is effective and feasible.

Key words: outlier detection, local density, strong k nearest neighbors, weak k nearest neighbors, Reverse k Nearest Neighbors (RkNN)

摘要： 已有的密度离群点检测算法LOF不能适应数据分布异常情况离群点检测，INFLO算法虽引入反向k近邻点集有效地解决了数据分布异常情况的离群点检测问题，但存在需要对所有数据点不加区分地分析其k近邻和反向k近邻点集导致的效率降低问题。针对该问题，提出局部密度离群点检测算法--LDBO，引入强k近邻点和弱k近邻点概念，通过分析邻近数据点的离群相关性，对数据点区别对待；并提出数据点离群性预判断策略，尽可能避免不必要的反向k近邻分析，有效提高数据分布异常情况离群点检测算法的效率。理论分析和实验结果表明，LDBO算法效率优于INFLO，算法是有效可行的。

关键词: 离群点检测, 局部密度, 强k近邻点, 弱k近邻点, 反向k近邻点集

CLC Number:

TP274

ZOU Yunfeng, ZHANG Xin, SONG Shiyuan, NI Weiwei. Fast outlier detection algorithm based on local density[J]. Journal of Computer Applications, 2017, 37(10): 2932-2937.

邹云峰, 张昕, 宋世渊, 倪巍伟. 基于局部密度的快速离群点检测算法[J]. 计算机应用, 2017, 37(10): 2932-2937.

References

[1] HAWKINS D. Identification of Outliers[M]. London: Chapman and Hall, 1980: 1-45.
[2] JOHNSON T, KWOK I, NG R. Fast computation of 2-dimensional depth contours[C]//KDD 1998: Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining. Menlo Park: AAAI Press, 1998: 224-228.
[3] KNORR E M, NG R T. Algorithms for mining distance-based outliers in large datasets[C]//VLDB 1998: Proceedings of the 24rd International Conference on Very Large Data Bases. San Francisco: Morgan Kaufmann Publishers, 1998: 392-403.
[4] BREUNIG M M, KRIEGEL H-P, NG R T, et al. LOF: identifying density-based local outliers[J]. ACM SIGMOD Record, 2000, 29(2): 93-104.
[5] PAPADIMITRIOU S, KITAGAWA H, GIBBONS P B. LOCI: fast outlier detection using the local correlation integral[C]//Proceedings of the 19th International Conference on Data Engineering. Piscataway, NJ: IEEE, 2004: 315-326.
[6] AGGARWAL C, YU P. Outlier detection for high dimensional data[J]. ACM SIGMOD Record, 2001, 30(2): 37-46.
[7] 倪巍伟, 陈耿, 陆介平, 等. 基于局部信息熵的加权子空间离群点检测算法[J]. 计算机研究与发展, 2008, 45(7): 1189-1194. (NI W W, CHEN G, LU J P, et al. Local entropy based weighted subspace outlier mining algorithm[J]. Journal of Computer Research and Development, 2008, 45(7): 1189-1194.)
[8] 刘露, 左万利, 彭涛.异质网中基于张量表示的动态离群点检测方法[J]. 计算机研究与发展, 2016, 53(8): 1729-1739. (LIU L, ZUO W L, PENG T. Tensor representation based dynamic outlier detection method in heterogeneous network[J]. Journal of Computer Research and Development, 2016, 53(8): 1729-1739.)
[9] 黄添强, 余养强, 郭躬德, 等.半监督的移动对象离群轨迹检测算法[J]. 计算机研究与发展, 2011, 48(11): 2074-2082. (HUANG T Q, YU Y Q, GUO G D, et al. Trajectory outlier detection based on semi-supervised technology[J]. Journal of Computer Research and Development, 2011, 48(11): 2074-2082.)
[10] 胡彩平, 秦小麟. 一种基于密度的局部离群点检测算法DLOF[J]. 计算机研究与发展, 2010, 47(12): 2110-2116. (HU C P, QIN X L. A density-based local outlier detecting algorithm [J]. Journal o f Computer Research and Development, 2010, 47(12): 2110-2116.)
[11] JIN W, TUNG A K H, HAN J, et al. Ranking outliers using symmetric neighborhood relationship[C]//Proceedings of the 10th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining. Berlin: Springer, 2006: 577-593.
[12] RADOVANOVIC M, NANOPOULOS A, IVANOVIC M. Reverse nearest neighbors in unsupervised distance-based outlier detection[J]. IEEE Transactions on Knowledge & Data Engineering, 2014, 27(5): 1369-1382.
[13] 杨慧, 王丽婧. 基于聚类和拟合的QAR数据离群点检测算法[J]. 计算机工程与设计, 2015, 36(1): 174-177. (YANG H, WANG L J.QAR data outlier detection algorithm based on clustering and fitting[J]. Computer Engineering and Design, 2015, 36(1): 174-177.)

[1]	MENG Fan, CHEN Guang, WANG Yong, GAO Yang, GAO Dequn, JIA Wenlong. Multi-granularity temporal structure representation based outlier detection method for prediction of oil reservoir [J]. Journal of Computer Applications, 2021, 41(8): 2453-2459.
[2]	NING Jin, CHEN Leiting, LUO Zijuan, ZHOU Chuan, ZENG Huiru. Evaluation metrics of outlier detection algorithms [J]. Journal of Computer Applications, 2020, 40(9): 2622-2627.
[3]	DU Xusheng, YU Jiong, YE Lele, CHEN Jiaying. Outlier detection algorithm based on graph random walk [J]. Journal of Computer Applications, 2020, 40(5): 1322-1328.
[4]	HUANG Gong, ZHAO Yongping, XIE Yunlong. Fault detection for turboshaft engine based on local density weighted one-class SVM algorithm [J]. Journal of Computer Applications, 2020, 40(3): 917-924.
[5]	NING Jin, CHEN Leiting, ZHOU Chuan, ZHANG Lei. Intelligent trigger mechanism for model aggregation and disaggregation [J]. Journal of Computer Applications, 2019, 39(6): 1614-1618.
[6]	SHANG Fangxin, GUO Hao, LI Gang, ZHANG Ling. Novel image segmentation method with noise based on One-class SVM [J]. Journal of Computer Applications, 2019, 39(3): 874-881.
[7]	TAO Tao, ZHOU Xi, MA Bo, ZHAO Fan. Abnormal time series data detection of gas station by Seq2Seq model based on bidirectional long short-term memory [J]. Journal of Computer Applications, 2019, 39(3): 924-929.
[8]	YUAN Zhong, FENG Shan. Outlier detection algorithm based on neighborhood value difference metric [J]. Journal of Computer Applications, 2018, 38(7): 1905-1909.
[9]	BAO Shuting, SUN Liping, ZHENG Xiaoyao, GUO Liangmin. Density peaks clustering algorithm based on shared near neighbors similarity [J]. Journal of Computer Applications, 2018, 38(6): 1601-1607.
[10]	YAN Hong, YANG Bo, YANG Hongyu. Outlier detection in time series data based on heteroscedastic Gaussian processes [J]. Journal of Computer Applications, 2018, 38(5): 1346-1352.
[11]	SHI Bai, ZHUANG Jie, PANG Hong. Non-cooperative indoor human motion detection based on channel state information [J]. Journal of Computer Applications, 2017, 37(7): 1843-1848.
[12]	QIU Baozhi, TANG Yamin. Efficient clustering algorithm for fast recognition of density backbone [J]. Journal of Computer Applications, 2017, 37(12): 3482-3486.
[13]	LIU Xing, ZHANG Hao, XU Lingwei. Beamforming based localization algorithm in 60GHz wireless local area networks [J]. Journal of Computer Applications, 2016, 36(8): 2170-2174.
[14]	WANG Xiao-ming WANG Shi-tong PENG Hong. Minimum variance support vector data description [J]. Journal of Computer Applications, 2012, 32(02): 416-424.
[15]	HU Yun SHI Jun WANG Chong-jun LI Hui. Outlier detection algorithm based on global nearest neighborhood [J]. Journal of Computer Applications, 2011, 31(10): 2778-2781.

Fast outlier detection algorithm based on local density

基于局部密度的快速离群点检测算法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics