计算机应用 ›› 2011, Vol. 31 ›› Issue (08): 2134-2137.DOI: 10.3724/SP.J.1087.2011.02134

• 人工智能 • 上一篇    下一篇

基于层次聚类的主动学习方法——HC_AL

贾俊芳   

  1. 山西大同大学 数学与计算机科学学院,山西 大同037009
  • 收稿日期:2011-01-06 修回日期:2011-03-14 发布日期:2011-08-01 出版日期:2011-08-01
  • 通讯作者: 贾俊芳
  • 作者简介:贾俊芳(1976-),女,山西左云人,讲师,主要研究方向:人工智能、机器学习。
  • 基金资助:

    山西省青年科学基金资助项目(2011021013-2)

HC_AL: New active learning method based on hierarchical clustering

Jun-fang JIA   

  1. School of Mathematics and Computer Science, Shanxi Datong University, Datong Shanxi 037009, China
  • Received:2011-01-06 Revised:2011-03-14 Online:2011-08-01 Published:2011-08-01
  • Contact: Jun-fang JIA

摘要: 针对传统主动学习(AL)方法对大规模的无标记样本分类收敛速度过慢的问题,提出了基于层次聚类(HC)的主动学习训练算法——HC_AL方法。通过对大规模的未标记数据进行层次聚类,并对每个层次上的类中心打标记来代替该层次上的类标记,然后将该层次上具有错误标记的类中心加入训练集。在数据集上的实验取得了较好的泛化能力和较快的收敛速度。实验结果表明通过采用分层细化、逐步求精的方法,可使主动学习的收敛速度大大提高,同时获得较为满意的学习能力。

关键词: 主动学习, 层次聚类, 分层细化, 逐步求精

Abstract: Concerning the slow convergence speed of unlabeled samples classification while using the traditional Active Learning (AL) method to deal with the large-scale data, a Hierarchical Clustering Active Learning (HC_AL) algorithm was proposed. During operation in the algorithm, the majority of the unlabeled data were clustered hierarchically and the center of each cluster was labeled to replace the category label of this hierarchy. Then the wrong labeled data were added into the training data sets. The experimental results at the data sets show that the proposed algorithm improves the generalization ability and the convergence speed. Moreover, it can greatly improve the active learning convergence speed and obtain relatively satisfactory learning ability by using the method of hierarchical refinement and stepwise refinement.

Key words: Active Learning (AL), Hierarchical Clustering (HC), hierarchical refinement, stepwise refinement

中图分类号: