《计算机应用》唯一官方网站 ›› 2025, Vol. 45 ›› Issue (12): 3864-3871.DOI: 10.11772/j.issn.1001-9081.2024121792

• 数据科学与技术 • 上一篇    下一篇

基于累积概率波动和自动化聚类的异常检测方法

曾君1,2, 童英华1,2, 王得芳1,2   

  1. 1.青海师范大学 计算机学院,西宁 810008
    2.省部共建藏语智能信息处理及应用国家重点实验室(青海师范大学),西宁 810008
  • 收稿日期:2024-12-19 修回日期:2025-04-01 接受日期:2025-04-07 发布日期:2025-04-22 出版日期:2025-12-10
  • 通讯作者: 童英华
  • 作者简介:曾君(1999—),男,湖南株洲人,硕士研究生,主要研究方向:物联网数据异常检测
    童英华(1982—),女,青海西宁人,副教授,博士,CCF会员,主要研究方向:无线传感器网络容错、物联网可靠性分析
    王得芳(1978—),男,青海西宁人,副教授,硕士,主要研究方向:数据异常检测。
  • 基金资助:
    青海省应用基础研究项目(2023-ZJ-713)

Anomaly detection method based on cumulative probability fluctuation and automated clustering

Jun ZENG1,2, Yinghua TONG1,2, Defang WANG1,2   

  1. 1.College of Computer,Qinghai Normal University,Xining Qinghai 810008,China
    2.The State Key Laboratory of Tibetan Intelligent Information Processing and Application (Qinghai Normal University),Xining Qinghai 810008,China
  • Received:2024-12-19 Revised:2025-04-01 Accepted:2025-04-07 Online:2025-04-22 Published:2025-12-10
  • Contact: Yinghua TONG
  • About author:ZENG Jun, born in 1999, M. S. candidate. His research interests include anomaly detection of internet of things data.
    TONG Yinghua, born in 1982, Ph. D., associate professor. Her research interests include fault tolerance in wireless sensor networks, reliability analysis of internet of things.
    WANG Defang, born in 1978, M. S., associate professor. His research interests include data anomaly detection.
  • Supported by:
    Applied Basic Research Project of Qinghai Province(2023-ZJ-713)

摘要:

随着多维数据特征的复杂性不断增加,现有的异常检测方法在捕捉特征分布方面表现出局限性,同时传统的聚类和统计方法在参数选择上遇到了更大的挑战,这些共同导致了检测性能的提升受到限制。针对上述问题,提出一种基于累积概率波动和自动化聚类的异常检测方法。首先,计算特征的累积概率波动以表征特征的高斯混合分布,并根据累积概率波动值对特征进行压缩变换;其次,在基于密度的噪声应用空间聚类(DBSCAN)中利用深度强化学习寻找最优聚类参数,并对压缩变换后的数据集进行聚类;最后,综合数据的聚类结果与数据特征的累积概率波动值判断数据点是否异常。实验结果表明,所提方法在6个实验数据集上的平均精确率、召回率、F1分数和受试者工作特征曲线下面积(AUC)相较于对比方法中表现最好的方法分别提升了36.39%、2.73%、14.90%和4.84%可见,所提方法在无需手动选择参数的情况下,有效提高了对多维复杂特征数据异常检测的综合性能。

关键词: 异常检测, 深度强化学习, 数据特征变换, 数据聚类, 累积概率波动

Abstract:

With the increasing complexity of multidimensional data features, the existing anomaly detection methods have limitations in capturing feature distribution. At the same time, traditional clustering and statistical methods encounter greater challenges in parameter selection, which limit the improvement of detection performance together. To address this issue, an anomaly detection method based on cumulative probability fluctuation and automated clustering was proposed. Firstly, cumulative probability fluctuation of the features was calculated to represent the Gaussian mixture distribution of the features, and the features were compression transformed according to the cumulative probability fluctuation values. Secondly, deep reinforcement learning was employed to search optimal clustering parameters in Density-Based Spatial Clustering of Applications with Noise (DBSCAN), and the compression transformed dataset was clustered. Finally, the clustering results of the data were combined with the cumulative probability fluctuation values of the data features to determine data point anomalies. Experimental results show that the average precision, recall, F1-score, and Area Under ROC (Receiver Operating Characteristic) Curve (AUC) of the proposed method on six experimental datasets are 36.39%, 2.73%, 14.90%, and 4.84% higher than those of the best performing method among the comparison methods. It can be seen that the proposed method improves the comprehensive performance of anomaly detection for data with multi-dimensional complex features effectively without selecting parameters manually.

Key words: anomaly detection, deep reinforcement learning, data feature transformation, data clustering, cumulative probability fluctuation

中图分类号: