Journal of Computer Applications ›› 2021, Vol. 41 ›› Issue (8): 2212-2218.DOI: 10.11772/j.issn.1001-9081.2020101542

Special Issue: 人工智能

• Artificial intelligence • Previous Articles     Next Articles

Distribution entropy penalized support vector data description

HU Tianjie1, HU Wenjun1, WANG Shitong2   

  1. 1. School of Information Engineering, Huzhou University, Huzhou Zhejiang 313000, China;
    2. School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi Jiangsu 214122, China
  • Received:2020-10-08 Revised:2021-01-30 Online:2021-08-10 Published:2021-02-24
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61772198), the Basic Public Welfare Research Program of Zhejiang Province (LGN18F020002).

分布熵惩罚的支持向量数据描述

胡天杰1, 胡文军1, 王士同2   

  1. 1. 湖州师范学院 信息工程学院, 浙江 湖州 313000;
    2. 江南大学 人工智能与计算机学院, 江苏 无锡 214122
  • 通讯作者: 胡天杰
  • 作者简介:胡天杰(1997-),男,安徽宣城人,硕士研究生,主要研究方向:模式识别、故障诊断;胡文军(1977-),男,安徽绩溪人,教授,博士,CCF会员,主要研究方向:机器学习、模式识别、智能系统;王士同(1964-),男,江苏扬州人,教授,硕士,主要研究方向:模式识别、数据挖掘、模糊系统。
  • 基金资助:
    国家自然科学基金资助项目(61772198);浙江省基础公益研究计划项目(LGN18F020002)。

Abstract: In order to solve the problem that traditional Support Vector Data Description (SVDD) is quite sensitive to penalty parameters, a new detection method, called Distribution Entropy Penalized SVDD (DEP-SVDD), was proposed. First, the normal samples were taken as the global distribution of the data, and the distance measure between each sample point and the normal sample distribution center was defined in the Gaussian kernel space. Then, a probability was defined for every data point, which was able to estimate the possibility of the point belonging to normal sample or abnormal one. Finally, the probability was used to construct the punishment degree based on distribution entropy to punish the corresponding samples. On 9 real-world datasets, the proposed method was compared with the algorithms of SVDD, Density Weighted SVDD (DW-SVDD), Position regularized SVDD (P-SVDD), K-Nearest Neighbor (KNN) and isolation Forest (iForest). The results show that DEP-SVDD achieves the highest classification precision on 6 datasets, which proves that DEP-SVDD has better performance advantages in anomaly detection than many anomaly detection methods.

Key words: anomaly detection, Support Vector Data Description (SVDD), Gaussian kernel, distribution entropy, classification

摘要: 针对支持向量数据描述(SVDD)对惩罚参数相当敏感的问题,提出一种新颖的异常检测方法,称为分布熵惩罚的支持向量数据描述(DEP-SVDD)。首先,将正常样本作为数据的全局分布,并在高斯核空间中定义每个样本点与正常样本分布中心的距离度量;然后,基于该距离设计评估样本点属于正常或异常样本的概率;最后,利用此概率构造基于分布熵的惩罚度以对相应的样本进行惩罚。在9个真实数据集上,将所提方法与SVDD、密度权的支持向量数据描述 (DW-SVDD)、位置正则的支持向量数据描述(P-SVDD)、K最近邻(KNN)和孤立森林(iForest)算法进行对比实验,结果表明DEP-SVDD在6个数据集上获得了最高的分类精度,可见相较于多种异常检测方法,DEP-SVDD在异常检测中具有更好的性能优势。

关键词: 异常检测, 支持向量数据描述, 高斯核, 分布熵, 分类

CLC Number: