Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (10): 3097-3104.DOI: 10.11772/j.issn.1001-9081.2023101419

• Data science and technology • Previous Articles     Next Articles

Fuzzy multi-granularity anomaly detection for incomplete mixed data

Yuhao TANG, Dezhong PENG, Zhong YUAN()   

  1. College of Computer Science,Sichuan University,Chengdu Sichuan 610065,China
  • Received:2023-10-20 Revised:2024-01-20 Accepted:2024-01-26 Online:2024-10-15 Published:2024-10-10
  • Contact: Zhong YUAN
  • About author:TANG Yuhao, born in 1999, M. S. candidate. His research interests include anomaly detection.
    PENG Dezhong, born in 1975, Ph. D., professor. His research interests include artificial intelligence.
  • Supported by:
    National Natural Science Foundation of China(62306196);Sichuan Provincial Science and Technology Plan Project(2023YFQ0020);Fundamental Research Funds for Central Universities(YJ202245)

面向不完备混合数据的模糊多粒度异常检测

唐宇皓, 彭德中, 袁钟()   

  1. 四川大学 计算机学院,成都 610065
  • 通讯作者: 袁钟
  • 作者简介:唐宇皓(1999—),男,重庆人,硕士研究生,主要研究方向:异常检测
    彭德中(1975—),男,四川成都人,教授,博士,CCF会员,主要研究方向:人工智能
    袁钟(1991—),男,四川乐山人,副研究员,博士,CCF会员,主要研究方向:异常检测 yuanzhong@scu.edu.cn
  • 基金资助:
    国家自然科学基金资助项目(62306196);四川省科技计划项目(2023YFQ0020);中央高校基本科研业务费专项资金资助项目(YJ202245)

Abstract:

In view of the inadequacy problem of most existing anomaly detection methods in effectively handling incomplete mixed data, a fuzzy multi-granularity anomaly detection algorithm for incomplete mixed data ADFIIS (Anomaly Detection in Fuzzy Incomplete Information System) was designed, which took into account the presence of missing values in both nominal and numeric attributes,and could handle mixed attribute data. The fuzzy similarity between attributes was defined and then the fuzzy entropy of each attribute was calculated. Based on the entropy values, a multi-granularity approach was employed to construct multiple attribute sequences. Subsequently,the outliers of each sample were calculated to characterize its degree of anomaly. Finally, the corresponding ADFIIS algorithm was designed, and its complexity was analyzed. Experiments were conducted on publicly available datasets, and the proposed algorithm was compared with some mainstream outlier detection algorithms such as ILGNI (Incomplete Local and Global Neighborhood Information network). Experimental results show that ADFIIS has better Receiver Operating Characteristic (ROC) curve performance on incomplete mixed datasets. On average, the Area Under the ROC Curve (AUC) of ADFIIS is better than 90% of the comparison methods. Compared with ILGNI, which can also handle incomplete mixed data, the average AUC of ADFIIS is improved by 7 percentage points. In the proposed algorithm, the model expansion method is used to detect anomalies in incomplete datasets without changing the original datasets, which expands the application scope of anomaly detection.

Key words: fuzzy rough set, multi-granularity, anomaly detection, outlier detection, incomplete mixed data

摘要:

针对现有的异常检测方法大多无法有效处理不完备混合数据的问题,提出一种面向不完备混合数据的模糊多粒度异常检测算法ADFIIS(Anomaly Detection in Fuzzy Incomplete Information System),所提算法考虑在标称属性和在数值属性上出现缺失值的情况,能处理混合属性数据。首先,定义属性之间的模糊相似度;其次,计算每个属性的模糊熵,基于熵的大小使用多粒度的思想构建多个属性序列;再次,计算每个样本的异常值以表征它的异常程度;最后,设计相应的ADFIIS算法并分析它的复杂度。在公开数据集上进行实验,将所提算法与ILGNI(Incomplete Local and Global Neighborhood Information network)等主流离群点检测算法对比。实验结果表明,ADFIIS在不完备混合数据集上的受试者操作特征(ROC)曲线效果更好。ADFIIS的曲线下面积(AUC)的平均值优于90%的对比方法,相较于同样能够处理不完备混合数据的ILGNI,它的AUC平均值提升了7个百分点。所提算法使用模型扩展法在不改变原始数据集的情况下对不完备数据集进行异常检测,拓展了异常检测的适用范围。

关键词: 模糊粗糙集, 多粒度, 异常检测, 离群检测, 不完备混合数据

CLC Number: