基于C均值聚类和图转导的半监督分类算法

doi:10.11772/j.issn.1001-9081.2017.09.2595

计算机应用 ›› 2017, Vol. 37 ›› Issue (9): 2595-2599.DOI: 10.11772/j.issn.1001-9081.2017.09.2595

基于C均值聚类和图转导的半监督分类算法

王娜, 王小凤, 耿国华, 宋倩楠

西北大学信息科学与技术学院, 西安 710000

收稿日期:2017-04-01 修回日期:2017-06-01 出版日期:2017-09-10 发布日期:2017-09-13
通讯作者: 王小凤,xfwang@nwu.edu.cn
作者简介:王娜(1993-),女,陕西榆林人,硕士研究生,主要研究方向:图像处理、模式识别;王小凤(1979-),女,陕西渭南人,副教授,博士,CCF会员,主要研究方向:数据挖掘、三维模型检索、模式识别;耿国华(1955-),女,山东莱西人,教授,博士,CCF会员,主要研究方向:科学计算可视化、模式识别、智能信息处理;宋倩楠(1994-),女,山西运城人,硕士研究生,主要研究方向:图像处理、模式识别。
基金资助:
国家自然科学基金青年科学基金资助项目（61602380）；国家自然科学基金面上项目（61373117， 61673319）；陕西省国际合作项目（2013KW04-04）。

Semi-supervised classification algorithm based on C-means clustering and graph transduction

WANG Na, WANG Xiaofeng, GENG Guohua, SONG Qiannan

College of Information Science and Technology, Northwest University, Xi'an Shaanxi 710000, China

Received:2017-04-01 Revised:2017-06-01 Online:2017-09-10 Published:2017-09-13
Supported by:
This work is partially supported by Youth Science Foundation of the National Natural Science Foundation of China (61602380), the General Program of the National Natural Science Foundation of China (61373117, 61673319), Shaanxi Province International Cooperation Project (2013KW04-04).

摘要/Abstract

摘要： 针对传统图转导（GT）算法计算量大并且准确率不高的问题，提出一个基于C均值聚类和图转导的半监督分类算法。首先，采用模糊C均值（FCM）聚类算法先对未标记样本预选取，缩小图转导算法构图数据集的范围；然后，构建k近邻稀疏图，减少相似度矩阵的虚假连接，进而缩减了构图的时间，通过标记传播的方式得出初选未标记样本的标记信息；最后，结合半监督流形假设模型利用扩充的标记数据集以及剩余未标记数据集进行分类器的训练，进而得出最终的分类结果。在Weizmann Horse数据集下，所提算法分类准确率均达到96%以上，和传统仅使用图转导的分类方法相比，解决了对初始标记集的依赖性问题，将准确率至少提高了10%；将所提算法直接运用到兵马俑数据集，分类准确度也达到95%以上，明显高于传统的图转导算法。实验结果表明，基于C均值聚类和图转导的半监督分类算法，在图像分类方面有较好的分类效果，对图像的精准分类具有研究意义。

关键词: C均值聚类, 图转导, 半监督分类, 相似度矩阵, 稀疏图

Abstract: Aiming at the problem that the traditional Graph Transduction (GT) algorithm is computationally intensive and inaccurate, a semi-supervised classification algorithm based on C-means clustering and graph transduction was proposed. Firstly, the Fuzzy C-Means (FCM) clustering algorithm was used to pre-select unlabeled samples and reduce the range of the GT algorithm. Then, the k-nearest neighbor sparse graph was constructed to reduce the false connection of the similarity matrix, thereby reducing the time of composition, and the label information of the primary unlabeled samples was obtained by means of label propagation. Finally, combined with the semi-supervised manifold hypothesis model, the extended marker data set and the remaining unlabeled data set were used to train the classifier, and then the final classification result was obtained. In the Weizmann Horse data set, the accuracy of the proposed algorithm was more than 96%, compared with the traditional method of only using GT to solve the dependence problem on the initial set of labels, the accuracy was increased by at least 10%. The proposed algorithm was applied directly to the terracotta warriors and horses, and the classification accuracy was more than 95%, which was obviously higher than that of the traditional graph transduction algorithm. The experimental results show that the semi-supervised classification algorithm based on C-means clustering and graph transduction has better classification effect in image classification, and it is of great significance for accurate classification of images.

Key words: C-means clustering, Graph Transduction (GT), semi-supervised classification, similarity matrix, sparse map

中图分类号:

TP391.4

王娜, 王小凤, 耿国华, 宋倩楠. 基于C均值聚类和图转导的半监督分类算法[J]. 计算机应用, 2017, 37(9): 2595-2599.

WANG Na, WANG Xiaofeng, GENG Guohua, SONG Qiannan. Semi-supervised classification algorithm based on C-means clustering and graph transduction[J]. Journal of Computer Applications, 2017, 37(9): 2595-2599.

参考文献

[1] ZHU X, GHAHRAMANI Z. Learning from labeled and unlabeled data with label propagation[EB/OL].[2016-12-14]. http://pages.cs.wisc.edu/~jerryzhu/pub/CMU-CALD-02-107.pdf.
[2] ZHU X, GHAHRAMANI Z, LAFFERTY J. Semi-supervised learning using Gaussian fields and harmonic functions[C]//Proceedings of the 20th International Conference on Machine Learning. Menlo Park, CA:AAAI Press, 2003:912-919.
[3] ZHOU D, BOUSQUET O, LAL T N, et al. Learning with local and global consistency[C]//Proceedings of the 16th International Conference on Neural Information Processing Systems. Cambridge, MA:MIT Press, 2003:321-328.
[4] KIM K H, CHOI S. Label propagation through minimax paths for scalable semi-supervised learning[J]. Pattern Recognition Letters, 2014, 45(1):17-25.
[5] 汪西莉,蔺洪帅.最小代价路径标签传播算法[J].计算机学报,2016,39(7):1407-1418.(WANG X L, LIN H S. Label propagation through minimum cost path[J]. Chinese Journal of Computers, 2016,39(7):1407-1418.)
[6] 晋小玲.图转导理论的研究与应用[D].北京:华北电力大学,2011:6-15.(JIN X L. Research and application of graphic conduction theory[D]. Beijing:North China Electric Power University, 2011:6-15.)
[7] KUMAR D M, PRASHANTH K, PERURU P K, et al. A novel technique for edge detection using Gabor transform and k-means with FCM algorithms[M]//Emerging Trends in Electrical, Communications and Information Technologies, LNEE 394. Berlin:Springer, 2017:273-280.
[8] TANHA J, SOMEREN M V, AFSARMANESH H. Semi-supervised self-training for decision tree classifiers[J]. International Journal of Machine Learning & Cybernetics, 2017, 8(1):355-370.
[9] KIM K I, TOMPKIN J, PFISTER H, et al. Semi-supervised learning with explicit relationship regularization[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2016:2188-2196.
[10] 白艺娜.基于图的半监督图像分类[D].西安:陕西师范大学,2014:10-17.(BAI Y N. Semi-supervised image classification based on graph[D]. Xi'an:Shaanxi Normal University, 2014:10-17.)
[11] 陈永健.半监督支持向量机分类方法研究[D].西安:陕西师范大学,2014:17-18.(CHEN Y J. Research on classification method of semi-supervised support vector machine[D]. Xi'an:Shaanxi Normal University, 2014:17-18.)
[12] SONG W, LI S, KANG X, et al. Hyperspectral image classification based on KNN sparse representation[C]//Proceedings of the 2016 IEEE International Geoscience and Remote Sensing Symposium. Piscataway, NJ:IEEE, 2016:2411-2414.
[13] JING L, YANG L, YU J, et al. Semi-supervised low-rank mapping learning for multi-label classification[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2015:1483-1491.
[14] BELKIN M, NIYOGI P, SINDHWANI V. Manifold regularization:a geometric framework for learning from examples[J]. Journal of Machine Learning Research, 2004, 7(1):2399-2434.
[15] HUANG Q, MAO J, LIU Y. An improved grid search algorithm of SVR parameters optimization[C]//Proceedings of the 2012 IEEE 14th International Conference on Communication Technology. Piscataway, NJ:IEEE, 2013:1022-1026.
[16] PONTES F J, AMORIM G F, BALESTRASSI P P, et al. Design of experiments and focused grid search for neural network parameter optimization[J]. Neurocomputing, 2016, 186:22-34.
[17] FU W, LI S, FANG L. Spectral-spatial hyperspectral image classification via superpixel merging and sparse representation[C]//Proceedings of the 2015 IEEE International Geoscience and Remote Sensing Symposium. Piscataway, NJ:IEEE, 2015:4971-4974.

基于C均值聚类和图转导的半监督分类算法

Semi-supervised classification algorithm based on C-means clustering and graph transduction

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	张成, 万源, 强浩鹏. 基于知识蒸馏的深度无监督离散跨模态哈希[J]. 计算机应用, 2021, 41(9): 2523-2531.
[2]	袁芊芊, 邓洪敏, 王晓航. 基于超像素快速模糊C均值聚类与支持向量机的柑橘病虫害区域分割[J]. 计算机应用, 2021, 41(2): 563-570.
[3]	孙建军, 徐岩. 基于加权改进模糊C均值聚类的欠定混合矩阵估计[J]. 计算机应用, 2020, 40(6): 1769-1773.
[4]	王燕, 何宏科. 基于邻域信息的改进模糊c均值脑MRI分割[J]. 计算机应用, 2020, 40(4): 1196-1201.
[5]	吕亚丽, 苗钧重, 胡玮昕. 基于标签进行度量学习的图半监督学习算法[J]. 计算机应用, 2020, 40(12): 3430-3436.
[6]	杨燕琳, 冶忠林, 赵海兴, 孟磊. 基于高阶近似的链路预测算法[J]. 计算机应用, 2019, 39(8): 2366-2373.
[7]	董发志, 丁洪伟, 杨志军, 熊成彪, 张颖婕. 基于遗传算法和模糊C均值聚类的WSN分簇路由算法[J]. 计算机应用, 2019, 39(8): 2359-2365.
[8]	刘晓明, 沈明玉, 侯整风. 基于Levy飞行的萤火虫模糊聚类算法[J]. 计算机应用, 2019, 39(11): 3257-3262.
[9]	戚攀, 包开阳, 马皛源. 基于模糊C均值聚类及群体智能的WSN分层路由算法[J]. 计算机应用, 2018, 38(7): 1974-1980.
[10]	梁冰, 徐华. 基于改进人工蜂群的核模糊聚类算法[J]. 计算机应用, 2017, 37(9): 2600-2604.
[11]	褚征, 于炯, 王佳玉, 王跃飞. 基于LDA主题模型的移动应用相似度构建方法[J]. 计算机应用, 2017, 37(4): 1075-1082.
[12]	李斌, 狄岚, 王少华, 于晓瞳. 基于改进核模糊C均值类间极大化聚类算法[J]. 计算机应用, 2016, 36(7): 1981-1987.
[13]	吴蕾, 田儒雅, 张学福. 稀疏分层概率自组织图实例迁移学习方法[J]. 计算机应用, 2016, 36(3): 692-696.
[14]	王昱洁, 蒋薇薇. 基于模糊C均值聚类与单类支持向量机的音频隐写分析方法[J]. 计算机应用, 2016, 36(3): 647-652.
[15]	孙娟王兵杨颖田学东. 聚类分析在肺结节识别中的应用[J]. 计算机应用, 2014, 34(7): 2050-2053.