Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (6): 1705-1712.DOI: 10.11772/j.issn.1001-9081.2022060930
• The 37 CCF National Conference of Computer Applications (CCF NCCA 2022) • Previous Articles Next Articles
Zhongping ZHANG1,2,3(), Xin GUO1, Yuting ZHANG1, Ruibo ZHANG4
Received:
2022-06-28
Revised:
2022-08-10
Accepted:
2022-08-12
Online:
2022-09-22
Published:
2023-06-10
Contact:
Zhongping ZHANG
About author:
GUO Xin, born in 1997, M. S. candidate. Her research interests include data mining.Supported by:
通讯作者:
张忠平
作者简介:
张忠平(1972—),男,吉林松原人,教授,博士,CCF会员,主要研究方向:大数据、数据挖掘、半结构化数据;Email:979935240@qq.comCLC Number:
Zhongping ZHANG, Xin GUO, Yuting ZHANG, Ruibo ZHANG. Outlier detection algorithm based on hologram stationary distribution factor[J]. Journal of Computer Applications, 2023, 43(6): 1705-1712.
张忠平, 郭鑫, 张玉停, 张睿博. 基于全息图平稳分布因子的离群点检测算法[J]. 《计算机应用》唯一官方网站, 2023, 43(6): 1705-1712.
Add to citation manager EndNote|Ris|BibTeX
URL: http://www.joca.cn/EN/10.11772/j.issn.1001-9081.2022060930
软硬件环境 | 参数 | 软硬件环境 | 参数 |
---|---|---|---|
CPU | 3.20 GHz Intel i5-6500 | 开发环境 | PyCharm |
硬盘 | 1 TB | 编译环境 | Python 3.8 |
内存 | 8 GB | 可视化工具 | Matplotlib |
Tab. 1 Experimental environment
软硬件环境 | 参数 | 软硬件环境 | 参数 |
---|---|---|---|
CPU | 3.20 GHz Intel i5-6500 | 开发环境 | PyCharm |
硬盘 | 1 TB | 编译环境 | Python 3.8 |
内存 | 8 GB | 可视化工具 | Matplotlib |
数据集 | 数据点数 | 离群点数 | 离群点比例/% |
---|---|---|---|
A1 | 1 045 | 34 | 3.2 |
A2 | 1 372 | 72 | 5.2 |
A3 | 1 038 | 40 | 3.8 |
A4 | 1 031 | 35 | 3.3 |
Tab. 2 Data characteristics of synthetic datasets
数据集 | 数据点数 | 离群点数 | 离群点比例/% |
---|---|---|---|
A1 | 1 045 | 34 | 3.2 |
A2 | 1 372 | 72 | 5.2 |
A3 | 1 038 | 40 | 3.8 |
A4 | 1 031 | 35 | 3.3 |
数据集 | 数据点数 | 数据维度数 | 离群点数 | 离群点比例/% |
---|---|---|---|---|
WDBC | 390 | 30 | 33 | 8.4 |
WBC | 223 | 9 | 10 | 4.0 |
Ionosphere | 351 | 32 | 126 | 35.8 |
Ecoli | 168 | 7 | 25 | 14.8 |
Tab. 3 Data characteristics of real datasets
数据集 | 数据点数 | 数据维度数 | 离群点数 | 离群点比例/% |
---|---|---|---|---|
WDBC | 390 | 30 | 33 | 8.4 |
WBC | 223 | 9 | 10 | 4.0 |
Ionosphere | 351 | 32 | 126 | 35.8 |
Ecoli | 168 | 7 | 25 | 14.8 |
1 | HODGE V, AUSTIN J. A survey of outlier detection methodologies[J]. Artificial Intelligence Review, 2004, 22(2): 85-126. 10.1023/b:aire.0000045502.10941.a9 |
2 | NGAI E W T, HU Y, WONG Y H, et al. The application of data mining techniques in financial fraud detection: a classification framework and an academic review of literature[J]. Decision Support Systems, 2011, 50(3): 559-569. 10.1016/j.dss.2010.08.006 |
3 | JOW J, XIAO Y, HAN W L. A survey of intrusion detection systems in smart grid[J]. International Journal of Sensor Networks, 2017, 23(3): 170-186. 10.1504/ijsnet.2017.083410 |
4 | KHAN I, CAPOZZOLI A, CORGNATI S P, et al. Fault detection analysis of building energy consumption using data mining techniques[J]. Energy Procedia, 2013, 42: 557-566. 10.1016/j.egypro.2013.11.057 |
5 | ZHAO F, WANG H, CHAI X J, et al. A fast and effective outlier detection method for matching uncalibrated images[C]// Proceedings of the 16th IEEE International Conference on Image Processing. Piscataway: IEEE, 2009: 2097-2100. 10.1109/icip.2009.5414409 |
6 | YANG X W, LATECKI L J, POKRAJAC D. Outlier detection with globally optimal exemplar-based GMM[C]// Proceedings of the 2009 SIAM International Conference on Data Mining. Philadelphia, PA: SIAM, 2009: 145-154. 10.1137/1.9781611972795.13 |
7 | DALATU P I, FITRIANTO A, MUSTAPHA A. A comparative study of linear and nonlinear regression models for outlier detection[C]// Proceedings of the 2016 International Conference on Soft Computing and Data Mining, AISC 549. Cham: Springer, 2017: 316-326. 10.1007/978-3-319-51281-5_32 |
8 | LATECKI L J, LAZAREVIC A, POKRAJAC D. Outlier detection with kernel density functions[C]// Proceedings of the 2007 International Workshop on Machine Learning and Data Mining in Pattern Recognition, LNCS 4571. Berlin: Springer, 2007: 61-75. |
9 | RAMASWAMY S, RASTOGI R, SHIM K. Efficient algorithms for mining outliers from large data sets[J]. ACM SIGMOD Record, 2000, 29(2): 427-438. 10.1145/335191.335437 |
10 | GHOTING A, PARTHASARATHY S, OTEY M E. Fast mining of distance-based outliers in high-dimensional datasets[J]. Data Mining and Knowledge Discovery, 2008, 16(3): 349-364. 10.1007/s10618-008-0093-2 |
11 | ANGIULLI F, FASSETTI F. Very efficient mining of distance-based outliers[C]// Proceedings of the 16th ACM Conference on Information and Knowledge Management. New York: ACM, 2007: 791-800. 10.1145/1321440.1321550 |
12 | TANG J, CHEN Z X, FU A W C, et al. Enhancing effectiveness of outlier detections for low density patterns[C]// Proceedings of the 2002 Pacific-Asia Conference on Knowledge Discovery and Data Mining, LNCS 2336. Berlin: Springer, 2002: 535-548. |
13 | KRIEGEL H P, KRÖGER P, SCHUBERT E, et al. LoOP: local outlier probabilities[C]// Proceedings of the 18th ACM Conference on Information and Knowledge Management. New York: ACM, 2009: 1949-1652. 10.1145/1645953.1646195 |
14 | REN D M, WANG B Y, PERRIZO W. RDF: a density-based outlier detection method using vertical data representation[C]// Proceedings of the 4th IEEE International Conference on Data Mining. Piscataway: IEEE, 2004: 503-506. |
15 | AKOGLU L, TONG H H, KOUTRA D. Graph based anomaly detection and description: a survey[J]. Data Mining and Knowledge Discovery, 2015, 29(3): 626-688. 10.1007/s10618-014-0365-y |
16 | WANG C, GAO H, LIU Z, et al. Outlier detection using diverse neighborhood graphs[C]// Proceedings of the 15th International Computer Conference on Wavelet Active Media Technology and Information Processing. Piscataway: IEEE, 2018: 58-62. 10.1109/iccwamtip.2018.8632604 |
17 | WANG C, LIU Z, GAO H, et al. VOS: a new outlier detection model using virtual graph[J]. Knowledge-Based Systems, 2019, 185: No.104907. 10.1016/j.knosys.2019.104907 |
18 | PAVLIDOU M, ZIOUTAS G. Kernel density outlier detector[M]// Topics in Nonparametric Statistics: Proceedings of the First Conference of the International Society for Nonparametric Statistics, PROMS 74. New York: Springer, 2014:241-250. 10.1007/978-1-4939-0569-0_22 |
19 | KNORR E M, NG R T. Algorithms for mining distance-based outliers in large datasets [C]// Proceedings of the 24th International Conference on Very Large Data Bases. San Francisco: Morgan Kaufmann Publishers Inc., 1998: 392-403. |
20 | BREUNIG M M, KRIEGEL H P, NG R T, et al. LOF: identifying density-based local outliers[J]. ACM SIGMOD Record, 2000, 29(2): 93-104. 10.1145/335191.335388 |
21 | MOONESINGHE H D K, TAN P N. OutRank: a graph-based outlier detection framework using random walk[J]. International Journal on Artificial Intelligence Tools, 2008, 17(1): 19-36. 10.1142/s0218213008003753 |
22 | WANG C, GAO H, LIU Z, et al. A new outlier detection model using random walk on local information graph[J]. IEEE Access, 2018, 6: 75531-75544. 10.1109/access.2018.2883681 |
23 | KRIEGEL H P, KRÖGER P, SCHUBERT E, et al. Outlier detection in axis-parallel subspaces of high dimensional data [C]// Proceedings of the 2009 Pacific-Asia Conference on Knowledge Discovery and Data Mining, LNCS 5476. Berlin: Springer, 2009: 831-838. |
24 | ZHAO Y, HU X Y, CHENG C, et al. SUOD: accelerating large-scale unsupervised heterogeneous outlier detection[C/OL]// Proceedings of the 4th Conference on Machine Learning and Systems. [2022-05-09].. |
25 | LIU F T, TING K M, ZHOU Z H. Isolation forest[C]// Proceedings of the 8th IEEE International Conference on Data Mining. Piscataway: IEEE, 2008: 413-422. 10.1109/icdm.2008.17 |
26 | GOLDSTEIN M, DENGEL A. Histogram-Based Outlier Score (HBOS): a fast unsupervised anomaly detection algorithm[C/OL]// Proceedings of the 35th German Conference on Artificial Intelligence: Poster and Demo Track. [2022-05-09].. |
[1] | Xinwei LYU, Shuxia LU. Iteratively modified robust extreme learning machine [J]. Journal of Computer Applications, 2023, 43(5): 1342-1348. |
[2] | Yiyang GUO, Jiong YU, Xusheng DU, Shaozhi YANG, Ming CAO. Outlier detection algorithm based on autoencoder and ensemble learning [J]. Journal of Computer Applications, 2022, 42(7): 2078-2087. |
[3] | MENG Fan, CHEN Guang, WANG Yong, GAO Yang, GAO Dequn, JIA Wenlong. Multi-granularity temporal structure representation based outlier detection method for prediction of oil reservoir [J]. Journal of Computer Applications, 2021, 41(8): 2453-2459. |
[4] | LI Guorong, YE Jimin, ZHEN Yuanting. Time series clustering based on new robust similarity measure [J]. Journal of Computer Applications, 2021, 41(5): 1343-1347. |
[5] | NING Jin, CHEN Leiting, LUO Zijuan, ZHOU Chuan, ZENG Huiru. Evaluation metrics of outlier detection algorithms [J]. Journal of Computer Applications, 2020, 40(9): 2622-2627. |
[6] | SUN Jianjun, XU Yan. Estimation of underdetermined mixing matrix based on improved weighted fuzzy C-means clustering [J]. Journal of Computer Applications, 2020, 40(6): 1769-1773. |
[7] | DU Xusheng, YU Jiong, YE Lele, CHEN Jiaying. Outlier detection algorithm based on graph random walk [J]. Journal of Computer Applications, 2020, 40(5): 1322-1328. |
[8] | LU Guangyue, ZHOU Liang, LYU Shaoqing, SHI Cong, SU Keke. Outlier node detection algorithm in wireless sensor networks based ongraph signal processing [J]. Journal of Computer Applications, 2020, 40(3): 783-787. |
[9] | NING Jin, CHEN Leiting, ZHOU Chuan, ZHANG Lei. Intelligent trigger mechanism for model aggregation and disaggregation [J]. Journal of Computer Applications, 2019, 39(6): 1614-1618. |
[10] | SHANG Fangxin, GUO Hao, LI Gang, ZHANG Ling. Novel image segmentation method with noise based on One-class SVM [J]. Journal of Computer Applications, 2019, 39(3): 874-881. |
[11] | TAO Tao, ZHOU Xi, MA Bo, ZHAO Fan. Abnormal time series data detection of gas station by Seq2Seq model based on bidirectional long short-term memory [J]. Journal of Computer Applications, 2019, 39(3): 924-929. |
[12] | YUAN Zhong, FENG Shan. Outlier detection algorithm based on neighborhood value difference metric [J]. Journal of Computer Applications, 2018, 38(7): 1905-1909. |
[13] | YAN Hong, YANG Bo, YANG Hongyu. Outlier detection in time series data based on heteroscedastic Gaussian processes [J]. Journal of Computer Applications, 2018, 38(5): 1346-1352. |
[14] | FENG Liwei, ZHANG Cheng, LI Yuan, XIE Yanhong. Local outlier factor fault detection method based on statistical pattern and local nearest neighborhood standardization [J]. Journal of Computer Applications, 2018, 38(4): 965-970. |
[15] | SHI Bai, ZHUANG Jie, PANG Hong. Non-cooperative indoor human motion detection based on channel state information [J]. Journal of Computer Applications, 2017, 37(7): 1843-1848. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||