Abstract:Concerning the slow convergence speed of unlabeled samples classification while using the traditional Active Learning (AL) method to deal with the large-scale data, a Hierarchical Clustering Active Learning (HC_AL) algorithm was proposed. During operation in the algorithm, the majority of the unlabeled data were clustered hierarchically and the center of each cluster was labeled to replace the category label of this hierarchy. Then the wrong labeled data were added into the training data sets. The experimental results at the data sets show that the proposed algorithm improves the generalization ability and the convergence speed. Moreover, it can greatly improve the active learning convergence speed and obtain relatively satisfactory learning ability by using the method of hierarchical refinement and stepwise refinement.
贾俊芳. 基于层次聚类的主动学习方法——HC_AL[J]. 计算机应用, 2011, 31(08): 2134-2137.
Jun-fang JIA. HC_AL: New active learning method based on hierarchical clustering. Journal of Computer Applications, 2011, 31(08): 2134-2137.
DIMA C, HEBERT M, STENTZ A. Enabling learning from large datasets: Applying active learning to mobile robotics [C]// ICRA 2004: Proceedings of the 2004 IEEE International Conference on Robotics and Automation. Piscataway, NJ: IEEE, 2004: 108-114.
[2]
VLACHOS A. A stopping criterion for active learning [J]. Computer Speech and Language, 2008, 22(3): 295-312.
CORD M, COSSELIN P H, PHILIPP-FOLIGUET S. Stochastic exploration and active learning for image retrieval [J]. Image and Vision Computing, 2007, 25(1): 14-23.
TONG S, KOLLER D. Support vector machine active learning with applications to text classification [J]. Journal of Machine Learning Research, 2002, 2(1): 45-66.
[8]
ABE N, MAMITSUKA H. Query learning strategies using boosting and bagging [C]// Proceedings of the 15th International Conference on Machine Learning. San Francisco, CA: Morgan Kaufmann Publishers, 1998: 1-9.
[9]
BAEZA-YATES R, HURTADO C, MENDOZA M. Query clustering for boosting Web page ranking [C]// AWIC 2004: Proceedings of the Second International Atlantic Web Intelligence Conference, LNCS 3034. Berlin: Springer-Verlag, 2004, 3034:164-175.
[10]
FINE S, GILAD-BACHRACH R, SHAMIR E. Query by committee, linear separation and random walks [J]. Theoretical Computer Science, 2002, 284(1): 25-51.
[11]
LONG JUN, YIN JIANPING, ZHU EN. An active learning method based on most possible misclassification sampling using committee [C]// MDAI'07: Proceedings of the 4th International Conference on Modeling Decisions for Artificial Intelligence. Berlin: Springer-Verlag, 2007: 104-113.
[12]
ROVER B, JEM J R, ROSS D K. Active learning for regression based on query by committee [C]// IDEAL'07: Proceedings of the 8th International Conference on Intelligent Data Engineering and Automated Learning, LNCS 4881. Berlin: Springer-Verlag, 2007: 209-218.
[13]
ZHU JINGBO, WANG HUIZHEN, TSOU B K, et al. Active learning with sampling by uncertainty and density for data annotations [J]. IEEE Transactions on Audio, Speech, and Language Processing, 2010, 18(6): 1323-1331.
YU H, YANG J, HAN J W, et al. Making SVMs scalable to large data sets using hierarchical cluster indexing [J]. Data Mining and Knowledge Discovery, 2005, 11(3): 100-128.