计算机应用 ›› 2019, Vol. 39 ›› Issue (11): 3134-3139.DOI: 10.11772/j.issn.1001-9081.2019050823

• 2019年中国粒计算与知识发现学术会议(CGCKD2019)论文 • 上一篇    下一篇

基于概率的支持向量数据描述方法

杨晨1,2,3, 王婕婷1,2,3, 李飞江1,2,3, 钱宇华1,2,3   

  1. 1. 山西大学 大数据科学与产业研究院, 太原 030006;
    2. 计算智能与中文信息处理教育部重点实验室(山西大学), 太原 030006;
    3. 山西大学 计算机与信息技术学院, 太原 030006
  • 收稿日期:2019-05-06 修回日期:2019-05-27 出版日期:2019-11-10 发布日期:2019-09-11
  • 通讯作者: 钱宇华
  • 作者简介:杨晨(1996-),女,山西临汾人,硕士研究生,主要研究方向:统计机器学习理论;王婕婷(1991-),女,山西临汾人,博士研究生,CCF会员,主要研究方向:统计机器学习理论、强化学习;李飞江(1990-),男,山西晋城人,博士研究生,CCF会员,主要研究方向:集群学习、无监督学习;钱宇华(1976-),男,山西晋城人,教授,博士,CCF会员,主要研究方向:机器学习、复杂网络、粗糙集理论、粒计算。
  • 基金资助:
    国家自然科学基金资助项目(61672332);山西省拔尖创新人才支持计划项目;山西青年三晋学者项目;山西省海外归国人员研究项目(2017023)。

Support vector data description method based on probability

YANG Chen1,2,3, WANG Jieting1,2,3, LI Feijiang1,2,3, QIAN Yuhua1,2,3   

  1. 1. Research Institute of Big Data Science and Industry, Shanxi University, Taiyuan Shanxi 030006, China;
    2. Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education(Shanxi University), Taiyuan Shanxi 030006, China;
    3. School of Computer and Information Technology, Shanxi University, Taiyuan Shanxi 030006, China
  • Received:2019-05-06 Revised:2019-05-27 Online:2019-11-10 Published:2019-09-11
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61672332), the Program for the Outstanding Innovative Teams of Higher Learning Institutions of Shanxi, the Program for the San Jin Young Scholars of Shanxi, the Overseas Returnee Research Program of Shanxi Province (2017023).

摘要: 针对目前概率机器学习方法在解决概率问题时具有较高的复杂度,而传统的支持向量数据描述(SVDD)作为一种核密度估计方法只能判断测试样本是否属于该类等问题,提出一种基于概率的支持向量数据描述方法。首先,利用传统的SVDD方法分别得到两类数据的数据描述,计算测试样本到超球体的距离;然后,构造一个将距离转换为概率的函数,提出一种基于概率的SVDD方法;同时,使用Bagging算法进行集成,进一步提高数据描述的性能。借鉴分类场景,将所提方法与传统的SVDD方法在Gunnar Raetsch的13种基准数据集上进行实验,实验结果表明,所提方法在准确率和F1值上优于传统的SVDD方法,并且其数据描述的性能有所提升。

关键词: 概率机器学习, 支持向量数据描述, 集成, 不确定性, 分类

Abstract: In view of the high complexity of current probabilistic machine learning methods in solving probability problems, and the fact that traditional Support Vector Data Description (SVDD), as a kernel density estimation method, can only estimate whether the test samples belong to this class, a probability-based SVDD method was proposed. Firstly, the traditional SVDD method was used to obtain the data descriptions of two types of data, and the distance between the test sample and the hypersphere was calculated. Then, a function was constructed to convert the distance into probability, and an SVDD method based on probability was proposed. At the same time, Bagging algorithm was used for the integration to further improve the performance of data description. By referring to classification scenarios, the proposed method was compared with the traditional SVDD method on 13 kinds of benchmark datasets of Gunnar Raetsch. The experimental results show that the proposed method is better than the traditional SVDD method on accuracy and F1-value, and its performance of data description is improved.

Key words: probabilistic machine learning, Support Vector Data Description (SVDD), ensemble, uncertainty, classification

中图分类号: