Outlier detection algorithm based on graph random walk
DU Xusheng1, YU Jiong1,2, YE Lele3, CHEN Jiaying2
1.School of Software, Xinjiang University, UrumqiXinjiang 830008, China
2.College of Information Science and Engineering, Xinjiang University, UrumqiXinjiang 830046, China
3.School of Software Engineering, Xi’an Jiaotong University, Xi’an Shannxi 710049, China
Outlier detection algorithms are widely used in various fields such as network intrusion detection, and medical aided diagnosis. Local Distance-Based Outlier Factor (LDOF), Cohesiveness-Based Outlier Factor (CBOF) and Local Outlier Factor (LOF) algorithms are classic algorithms for outlier detection with long execution time and low detection rate on large-scale datasets and high dimensional datasets. Aiming at these problems, an outlier detection algorithm Based on Graph Random Walk (BGRW) was proposed. Firstly, the iterations, damping factor and outlier degree for every object in the dataset were initialized. Then, the transition probability of the rambler between objects was deduced based on the Euclidean distance between the objects. And the outlier degree of every object in the dataset was calculated by iteration. Finally, the objects with highest outlier degree were output as outliers. On UCI (University of California, Irvine) real datasets and synthetic datasets with complex distribution, comparison between BGRW and LDOF, CBOF, LOF algorithms about detection rate, execution time and false positive rate were carried out. The experimental results show that BGRW is able to decrease execution time and false positive rate, and has higher detection rate.
1 HAWKINS D M . Identification of Outliers[M]. London: Chapman and Hall, 1980: 1-45.
2 PAULHEIM H , MEUSEL R . A decomposition of the outlier detection problem into a set of supervised learning problems[J]. Machine Learning, 2015, 100(2/3): 509-531.
3 TERZI D S , TERZI R , SAGIROGLU S . Big data analytics for network anomaly detection from netflow data[C]// Proceedings of the 2017 International Conference on Computer Science and Engineering. Piscataway: IEEE, 2017: 592-597.
4 LI W , MO W , ZHANG X , et al . Burn injury diagnostic imaging device’s accuracy improved by outlier detection and removal[C]// Proceedings of SPIE 9472, Algorithms and Technologies for Multispectral, Hyperspectral, and Ultraspectral Imagery XXI. Bellingham, WA: SPIE, 2015: No.947206.
5 WEST J , BHATTACHARYA M . Intelligent financial fraud detection: a comprehensive review[J]. Computers and Security, 2016, 57: 47-66.
6 GUO J , HUANG W , WILLIAMS B M . Real time traffic flow outlier detection using short-term traffic conditional variance prediction[J]. Transportation Research Part C: Emerging Technologies, 2015, 50: 160-172.
7 SILVA A P D , FILZMOSER P , BRITO P . Outlier detection in interval data[J]. Advances in Data Analysis and Classification, 2018, 12(3): 785-822.
8 PIEPEL G F . Robust regression and outlier detection[J]. Technometrics, 1989, 31(2):260-261.
9 KIRNER E , SCHUBERT E , ZIMEK A . Good and bad neighborhood approximations for outlier detection ensembles[C]// Proceedings of the 2017 International Conference on Similarity Search and Applications, LNCS 10609. Cham: Springer, 2017: 173-187.
10 HAN J , KAMBER M , PEI J . Mining frequent patterns, associations , and correlations: basic concepts and methods[M]// Data Mining: Concepts and Techniques. 3rd ed. San Francisco, CA: Morgan Kaufmann, 2012: 243-278.
11 AHMED M , MAHMOOD A N , HU J . A survey of network anomaly detection techniques[J]. Journal of Network and Computer Applications, 2016, 60: 19-31.
12 AMER M , GOLDSTEIN M , ABDENNADHER S . Enhancing one-class support vector machines for unsupervised anomaly detection[C]// Proceedings of the 2013 ACM SIGKDD Workshop on Outlier Detection and Description. New York: ACM, 2013: 8-15.
13 BREUNIG M M , KRIEGEL H P , NG R T, et al . LOF: identifying density-based local outliers[C]// Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data. New York: ACM, 2000: 93-104.
14 SCHREYER M , SATTAROV T , BORTH D , et al . Detection of anomalies in large scale accounting data using deep autoencoder networks[EB/OL].[2018-08-01].https://arxiv.org/pdf/1709.05254.pdf.
15 LIU F T , TING K , ZHOU Z . Isolation-based anomaly detection[J]. ACM Transactions on Knowledge Discovery from Data, 2012, 6(1):No.3.
16 ZHANG K , HUTTER M , JIN H . A new local distance-based outlier detection approach for scattered real-world data[C]// Proceedings of the 2009 Pacific-Asia Conference on Knowledge Discovery and Data Mining, LNCS 5476. Berlin: Springer, 2009: 813-822.
17 JOSHI V , BHATNAGAR R . CBOF: cohesiveness-based outlier factor A novel definition of outlier-ness[C]// Proceedings of the 2014 International Workshop on Machine Learning and Data Mining in Pattern Recognition, LNCS 8556. Cham: Springer, 2014: 175-189.
18 袁钟,冯山 . 基于邻域值差异度量的离群点检测算法[J]. 计算机应用, 2018, 38(7): 1905-1909. (YUAN Z, FENG S. Outlier detection algorithm based on neighborhood value difference metric[J]. Journal of Computer Applications, 2018, 38(7): 1905-1909.)
19 王习特,申德荣,白梅,等 . BOD:一种高效的分布式离群点检测算法[J]. 计算机学报, 2016, 39(1): 36-51. WANG X T , SHEN D R , BAI M , et al . BOD: an efficient algorithm for distributed outlier detection[J]. Chinese Journal of Computers, 2016, 39(1): 36-51.
20 SCHIFF G D , VOLK L A , VOLODARSKAYA M , et al . Screening for medication errors using an outlier detection system[J]. Journal of the American Medical Informatics Association, 2017, 24(2): 281-287.
21 HUANG A . A risk detection system of e-commerce: researches based on soft information extracted by affective computing Web texts[J]. Electronic Commerce Research, 2018, 18(1): 143-157.
22 ALAVERDYAN Z , JUNG J , BOUET R , et al . Regularized siamese neural network for unsupervised outlier detection on brain multiparametric magnetic resonance imaging: application to epilepsy lesion screening[J]. Medical Image Analysis, 2020, 60: No.101618.