《计算机应用》唯一官方网站 ›› 2024, Vol. 44 ›› Issue (12): 3808-3814.DOI: 10.11772/j.issn.1001-9081.2023121809

• 人工智能 • 上一篇    下一篇

带高斯核的支持向量数据描述问题的高效积极集法

张奇业, 曾心蕊()   

  1. 北京航空航天大学 数学科学学院,北京 102206
  • 收稿日期:2024-01-02 修回日期:2024-04-12 接受日期:2024-04-18 发布日期:2024-06-13 出版日期:2024-12-10
  • 通讯作者: 曾心蕊
  • 作者简介:张奇业(1975—),女,河南洛阳人,副教授,博士,主要研究方向:最优化理论、机器学习;
  • 基金资助:
    北京航空航天大学研究生教育与发展研究专项基金资助项目(JG2023014)

Efficient active-set method for support vector data description problem with Gaussian kernel

Qiye ZHANG, Xinrui ZENG()   

  1. School of Mathematical Sciences,Beihang University,Beijing 102206,China
  • Received:2024-01-02 Revised:2024-04-12 Accepted:2024-04-18 Online:2024-06-13 Published:2024-12-10
  • Contact: Xinrui ZENG
  • About author:ZHANG Qiye, born in 1975, Ph. D., associate professor. Her research interests include optimization theory, machine learning.
  • Supported by:
    Special Fund for Graduate Education and Development Research of Beihang University(JG2023014)

摘要:

针对积极集法求解支持向量数据描述(SVDD)问题时,在大规模数据场景下每次迭代计算量大、效率低的问题,设计一种带高斯核的SVDD问题的高效积极集法(ASM-SVDD)。首先,利用SVDD对偶模型约束条件的特殊性,每次迭代求解一个降维的等式约束子问题;其次,通过矩阵操作实现积极集的更新,每次更新计算只与当前支持向量及单个样本点有关,从而极大地降低计算量;另外,由于ASM-SVDD算法是传统积极集法的一种变体,应用积极集法理论得到该算法的有限终止性;最后,基于仿真和真实数据集,验证ASM-SVDD算法性能。结果表明,随着训练轮次的增加,ASM-SVDD算法可以有效提升模型性能。与求解SVDD问题的快速增量算法FISVDD (Fast Incremental SVDD)相比,ASM-SVDD算法在典型的低维高样本数据集shuttle上训练得到的目标函数值可减小25.9%,对支持向量的识别能力可提高10.0%。同时,ASM-SVDD算法在不同数据集上的F1分数相较于FISVDD算法均有提高,在超大规模数据集criteo上提高量可达0.07%。可见,ASM-SVDD算法在检测异常值的同时,训练得到的超球体更稳定,且对测试样本的判断准确率也更高,适用于大规模数据场景下的异常值检测。

关键词: 支持向量数据描述, 二次规划, 积极集法, 异常值检测, 有限终止性

Abstract:

To address the large amount of calculation and low efficiency during each iteration in large-scale data scenarios when using active-set method to solve the problem of Support Vector Data Description (SVDD), an efficient Active-Set Method for SVDD problem with Gaussian kernel (ASM-SVDD) was designed. Firstly, due to the peculiarity of constraint conditions in SVDD dual model, a dimension-reduced subproblem with equality constraints was solved in each iteration. Then, the active-set was updated through matrix manipulations. Each update calculation was only related to the existing support vectors and a single sample point, which reduced the amount of computation dramatically. In addition, since ASM-SVDD algorithm can be seen as a variant of the traditional active-set method, the limited termination of this algorithm was obtained by applying the theory of active-set method. Finally, simulation and real datasets were used to verify the performance of ASM-SVDD algorithm. The results show that ASM-SVDD algorithm can improve the model performance effectively as the number of training rounds increases. Compared to the fast incremental algorithm to solve SVDD problem — FISVDD (Fast Incremental SVDD), ASM-SVDD algorithm has the objective value obtained by training reduced by 25.9% and the recognition ability of support vectors improved by 10.0% on the typical low-dimensional high-sample dataset shuttle. At the same time, ASM-SVDD algorithm obtains F1 scores on different datasets all higher than FISVDD algorithm with the maximum improvement of 0.07% on the super large-scale dataset criteo. It can be seen that ASM-SVDD algorithm can obtain more stable hypersphere through training, and obtain higher judgment accuracy of test samples while performing outlier detection. Therefore, ASM-SVDD algorithm is suitable for outlier detection in large-scale data scenarios.

Key words: Support Vector Data Description (SVDD), quadratic programming, active-set method, outlier detection, limited termination

中图分类号: