Journal of Computer Applications ›› 2019, Vol. 39 ›› Issue (9): 2499-2504.DOI: 10.11772/j.issn.1001-9081.2019020763

• Artificial intelligence • Previous Articles     Next Articles

Cost-sensitive active learning through farthest distance sum sampling

REN Jie<sup>1</sup>, MIN Fan<sup>1</sup>, WANG Min<sup>2</sup>   

  1. 1. School of Computer Science, Southwest Petroleum University, Chengdu Sichuan 610500, China;
    2. School of Electrical Engineering and Information, Southwest Petroleum University, Chengdu Sichuan 610500, China
  • Received:2019-03-22 Revised:2019-05-06 Online:2019-09-10 Published:2019-06-03
  • Supported by:

    This work is partially supported by the Scientific Innovation Group for Youths of Sichuan Province (2019JDTD0017), the Applied Basic Research Project of Sichuan Province (2017JY0190).

基于最远总距离采样的代价敏感主动学习

任杰1, 闵帆1, 汪敏2   

  1. 1. 西南石油大学 计算机科学学院, 成都 610500;
    2. 西南石油大学 电气信息学院, 成都 610500
  • 通讯作者: 闵帆
  • 作者简介:任杰(1996-),男,山西忻州人,硕士研究生,主要研究方向:主动学习;闵帆(1973-),男,重庆人,教授,博士,CCF会员,主要研究方向:粒计算、推荐系统、主动学习;汪敏(1980-),女,湖南邵阳人,副教授,硕士,CCF会员,主要研究方向:数据挖掘、主动学习。
  • 基金资助:

    四川省青年科技创新团队专项(2019JDTD0017);四川省应用基础研究项目(2019JDTD0017)。

Abstract:

Active learning aims to reduce expert labeling through man-machine interaction, while cost-sensitive active learning focuses on balancing labeling and misclassification costs. Based on Three-Way Decision (3WD) methodology and Label Uniform Distribution (LUD) model, a Cost-sensitive Active learning through the Farthest distance sum Sampling (CAFS) algorithm was proposed. Firstly, the farthest total distance sampling strategy was designed to query the labels of representative samples. Secondly, LUD model and cost function were used to calculate the expected sampling number. Finally, k-Means algorithm was employed to split blocks obtained different labels. In CAFS, 3WD methodology was adopted in the iterative process of label query, instance prediction, and block splitting, until all instances were processed. The learning process was controlled by the cost minimization objective. Results on 9 public datasets show that CAFS has lower average cost compared with 11 mainstream algorithms.

Key words: active learning, k-Means clustering, label uniform distribution, Three-Way Decision (3WD)

摘要:

主动学习旨在通过人机交互减少专家标注,代价敏感主动学习则致力于平衡标注与误分类代价。基于三支决策(3WD)和标签均匀分布(LUD)模型,提出一种基于最远总距离采样的代价敏感主动学习算法(CAFS)。首先,设计了最远总距离采样策略,以查询代表性样本的标签;其次,利用了LUD模型和代价函数,计算期望采样数目;最后,使用了k-Means聚类技术分裂已获得不同标签的块。CAFS算法利用三支决策思想迭代地进行标签查询、实例预测和块分裂,直至处理完所有实例。学习过程在代价最小化目标的控制下进行。在9个公开数据上比较,CAFS比11个主流的算法具有更低的平均代价。

关键词: 主动学习, k-Means聚类, 标签均匀分布, 三支决策

CLC Number: