Journal of Computer Applications ›› 2019, Vol. 39 ›› Issue (9): 2499-2504.DOI: 10.11772/j.issn.1001-9081.2019020763

• Artificial intelligence • Previous Articles     Next Articles

Cost-sensitive active learning through farthest distance sum sampling

REN Jie<sup>1</sup>, MIN Fan<sup>1</sup>, WANG Min<sup>2</sup>   

  1. 1. School of Computer Science, Southwest Petroleum University, Chengdu Sichuan 610500, China;
    2. School of Electrical Engineering and Information, Southwest Petroleum University, Chengdu Sichuan 610500, China
  • Received:2019-03-22 Revised:2019-05-06 Online:2019-09-10 Published:2019-06-03
  • Supported by:

    This work is partially supported by the Scientific Innovation Group for Youths of Sichuan Province (2019JDTD0017), the Applied Basic Research Project of Sichuan Province (2017JY0190).


任杰1, 闵帆1, 汪敏2   

  1. 1. 西南石油大学 计算机科学学院, 成都 610500;
    2. 西南石油大学 电气信息学院, 成都 610500
  • 通讯作者: 闵帆
  • 作者简介:任杰(1996-),男,山西忻州人,硕士研究生,主要研究方向:主动学习;闵帆(1973-),男,重庆人,教授,博士,CCF会员,主要研究方向:粒计算、推荐系统、主动学习;汪敏(1980-),女,湖南邵阳人,副教授,硕士,CCF会员,主要研究方向:数据挖掘、主动学习。
  • 基金资助:



Active learning aims to reduce expert labeling through man-machine interaction, while cost-sensitive active learning focuses on balancing labeling and misclassification costs. Based on Three-Way Decision (3WD) methodology and Label Uniform Distribution (LUD) model, a Cost-sensitive Active learning through the Farthest distance sum Sampling (CAFS) algorithm was proposed. Firstly, the farthest total distance sampling strategy was designed to query the labels of representative samples. Secondly, LUD model and cost function were used to calculate the expected sampling number. Finally, k-Means algorithm was employed to split blocks obtained different labels. In CAFS, 3WD methodology was adopted in the iterative process of label query, instance prediction, and block splitting, until all instances were processed. The learning process was controlled by the cost minimization objective. Results on 9 public datasets show that CAFS has lower average cost compared with 11 mainstream algorithms.

Key words: active learning, k-Means clustering, label uniform distribution, Three-Way Decision (3WD)



关键词: 主动学习, k-Means聚类, 标签均匀分布, 三支决策

CLC Number: