计算机应用 ›› 2017, Vol. 37 ›› Issue (9): 2595-2599.DOI: 10.11772/j.issn.1001-9081.2017.09.2595

• 人工智能 • 上一篇    下一篇

基于C均值聚类和图转导的半监督分类算法

王娜, 王小凤, 耿国华, 宋倩楠   

  1. 西北大学 信息科学与技术学院, 西安 710000
  • 收稿日期:2017-04-01 修回日期:2017-06-01 出版日期:2017-09-10 发布日期:2017-09-13
  • 通讯作者: 王小凤,xfwang@nwu.edu.cn
  • 作者简介:王娜(1993-),女,陕西榆林人,硕士研究生,主要研究方向:图像处理、模式识别;王小凤(1979-),女,陕西渭南人,副教授,博士,CCF会员,主要研究方向:数据挖掘、三维模型检索、模式识别;耿国华(1955-),女,山东莱西人,教授,博士,CCF会员,主要研究方向:科学计算可视化、模式识别、智能信息处理;宋倩楠(1994-),女,山西运城人,硕士研究生,主要研究方向:图像处理、模式识别。
  • 基金资助:
    国家自然科学基金青年科学基金资助项目(61602380);国家自然科学基金面上项目(61373117, 61673319);陕西省国际合作项目(2013KW04-04)。

Semi-supervised classification algorithm based on C-means clustering and graph transduction

WANG Na, WANG Xiaofeng, GENG Guohua, SONG Qiannan   

  1. College of Information Science and Technology, Northwest University, Xi'an Shaanxi 710000, China
  • Received:2017-04-01 Revised:2017-06-01 Online:2017-09-10 Published:2017-09-13
  • Supported by:
    This work is partially supported by Youth Science Foundation of the National Natural Science Foundation of China (61602380), the General Program of the National Natural Science Foundation of China (61373117, 61673319), Shaanxi Province International Cooperation Project (2013KW04-04).

摘要: 针对传统图转导(GT)算法计算量大并且准确率不高的问题,提出一个基于C均值聚类和图转导的半监督分类算法。首先,采用模糊C均值(FCM)聚类算法先对未标记样本预选取,缩小图转导算法构图数据集的范围;然后,构建k近邻稀疏图,减少相似度矩阵的虚假连接,进而缩减了构图的时间,通过标记传播的方式得出初选未标记样本的标记信息;最后,结合半监督流形假设模型利用扩充的标记数据集以及剩余未标记数据集进行分类器的训练,进而得出最终的分类结果。在Weizmann Horse数据集下,所提算法分类准确率均达到96%以上,和传统仅使用图转导的分类方法相比,解决了对初始标记集的依赖性问题,将准确率至少提高了10%;将所提算法直接运用到兵马俑数据集,分类准确度也达到95%以上,明显高于传统的图转导算法。实验结果表明,基于C均值聚类和图转导的半监督分类算法,在图像分类方面有较好的分类效果,对图像的精准分类具有研究意义。

关键词: C均值聚类, 图转导, 半监督分类, 相似度矩阵, 稀疏图

Abstract: Aiming at the problem that the traditional Graph Transduction (GT) algorithm is computationally intensive and inaccurate, a semi-supervised classification algorithm based on C-means clustering and graph transduction was proposed. Firstly, the Fuzzy C-Means (FCM) clustering algorithm was used to pre-select unlabeled samples and reduce the range of the GT algorithm. Then, the k-nearest neighbor sparse graph was constructed to reduce the false connection of the similarity matrix, thereby reducing the time of composition, and the label information of the primary unlabeled samples was obtained by means of label propagation. Finally, combined with the semi-supervised manifold hypothesis model, the extended marker data set and the remaining unlabeled data set were used to train the classifier, and then the final classification result was obtained. In the Weizmann Horse data set, the accuracy of the proposed algorithm was more than 96%, compared with the traditional method of only using GT to solve the dependence problem on the initial set of labels, the accuracy was increased by at least 10%. The proposed algorithm was applied directly to the terracotta warriors and horses, and the classification accuracy was more than 95%, which was obviously higher than that of the traditional graph transduction algorithm. The experimental results show that the semi-supervised classification algorithm based on C-means clustering and graph transduction has better classification effect in image classification, and it is of great significance for accurate classification of images.

Key words: C-means clustering, Graph Transduction (GT), semi-supervised classification, similarity matrix, sparse map

中图分类号: