Journal of Computer Applications ›› 2020, Vol. 40 ›› Issue (9): 2622-2627.DOI: 10.11772/j.issn.1001-9081.2020010126

• Data science and technology • Previous Articles     Next Articles

Evaluation metrics of outlier detection algorithms

NING Jin1,2, CHEN Leiting1,2,3, LUO Zijuan4, ZHOU Chuan1,2, ZENG Huiru1,2   

  1. 1. School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu Sichuan 611731, China;
    2. Digital Media Technology Key Laboratory of Sichuan Province(University of Electronic Science and Technology of China), Chengdu Sichuan 611731, China;
    3. Institute of Electronic and Information Engineering in Guangdong, University of Electronic Science and Technology of China, Dongguan Guangdong 523808, China;
    4. Information System Engineering Laboratory, The 28 th Research Institute of China Electronics Technology Group Corporation, Nanjing Jiangsu 210007, China
  • Received:2020-02-13 Revised:2020-04-22 Online:2020-09-10 Published:2020-04-28
  • Supported by:
    This work is partially supported by the Sichuan Science and Technology Project (2019YJ017, 2019YJ0176, 2019YFQ0005).

离群点检测算法的评价指标

宁进1,2, 陈雷霆1,2,3, 罗子娟4, 周川1,2, 曾慧茹1,2   

  1. 1. 电子科技大学 计算机科学与工程学院, 成都 611731;
    2. 数字媒体技术四川省重点实验室(电子科技大学), 成都 611731;
    3. 电子科技大学 广东电子信息工程研究院, 广东 东莞 523808;
    4. 中国电子科技集团公司第二十八研究所 信息系统工程重点实验室, 南京 210007
  • 通讯作者: 周川
  • 作者简介:宁进(1991-),女,四川成都人,博士研究生,主要研究方向:离群点检测、数据挖掘;陈雷霆(1966-),男,四川成都人,教授,博士,主要研究方向:图像处理、计算机图形、虚拟现实;罗子娟(1985-),女,江西泰和人,高级工程师,硕士,主要研究方向:遥感影像目标识别、影像变化检测;周川(1977-),男,四川成都人,讲师,博士,主要研究方向:计算机动画、机器视觉、医学图像分析、数据挖掘;曾慧茹(1994-),女,江西赣州人,博士研究生,主要研究方向:深度学习、医学信息学。
  • 基金资助:
    四川省科技计划项目(2019YJ0177,2019YJ0176,2019YFQ0005)。

Abstract: With the in-depth research and extensive application of outlier detection technology, more and more excellent algorithms have been proposed. However, the existing outlier detection algorithms still use the evaluation metrics of traditional classification, which leads to the problems of singleness and poor adaptability of evaluation metrics. To solve these problems, the first type of High True positive rate-Area Under Curve (HT_AUC) and the second type of Low False positive rate-Area Under Curve (LF_AUC) were proposed. First, the commonly used outlier detection evaluation metrics were analyzed to illustrate their advantages and disadvantages as well as applicable scenarios. Then, based on the existing Area Under Curve (AUC) method, the HT_AUC and the LF_AUC were proposed aiming at the high True Positive Rate (TPR) demand and low False Positive Rate (FPR) demand respectively, so as to provide more suitable metrics for performance evaluation as well as quantization and integration of outlier detection algorithms. Experimental results on real-world datasets show that the proposed method is able to better satisfy the demands of the first type of high true rate and the second type of low false positive rate than the traditional evaluation metrics.

Key words: outlier detection, evaluation metric, Area Under Curve (AUC), True Positive Rate (TPR), False Positive Rate (FPR)

摘要: 随着离群点检测技术的深入研究和广泛应用,越来越多的优秀算法被提出来,然而,现有的离群点检测技术的评价仍然沿用传统分类算法的测量指标,存在着评价指标单一、适应性差的问题。针对这些问题,提出了一类高真正率指标(HT_AUC)和二类低假正率指标(LF_AUC)。首先,整理常用的离群点检测评价指标,分析其优缺点和适用场景;然后,在已有的曲线下面积(AUC)方法的基础上,分别针对高真正率(TPR)要求和低假正率(FPR)要求,提出了一类高真正率指标和二类低假正率指标,为离群点检测算法的效果评价和量化集成提供了更合适的指标。在真实数据集上的实验结果表明,与传统评价指标的相比,所提出的方法更能满足一类高真正率和二类低假正率要求。

关键词: 离群点检测, 评价指标, 曲线下面积, 真正率, 假正率

CLC Number: