Co-training algorithm combining improved density peak clustering and shared subspace

doi:10.11772/j.issn.1001-9081.2020071095

Journal of Computer Applications ›› 2021, Vol. 41 ›› Issue (3): 686-693.DOI: 10.11772/j.issn.1001-9081.2020071095

Special Issue: 人工智能

• Artificial intelligence • Previous Articles Next Articles

Co-training algorithm combining improved density peak clustering and shared subspace

LYU Jia^1,2, XIAN Yan^1,2

1. College of Computer and Information Science, Chongqing Normal University, Chongqing 401331, China;
2. Chongqing Center of Engineering Technology Research on Digital Agriculture Service, Chongqing Normal University, Chongqing 401331, China

Received:2020-07-24 Revised:2020-10-06 Online:2021-03-10 Published:2020-11-12
Supported by:
This work is partially supported by the Major Project of the National Natural Science Foundation of China (11991024), the Program of Chongqing University Innovation Research Group (CXQT20015), the Chongqing Graduate Research and Innovation Project (CYS20241).

结合改进密度峰值聚类和共享子空间的协同训练算法

吕佳^1,2, 鲜焱^1,2

1. 重庆师范大学计算机与信息科学学院, 重庆 401331;
2. 重庆师范大学重庆市数字农业服务工程技术研究中心, 重庆 401331

通讯作者: 吕佳
作者简介:吕佳(1978-),女,四川眉山人,教授,博士,CCF会员,主要研究方向:机器学习、数据挖掘;鲜焱(1995-),女,重庆涪陵人,硕士研究生,主要研究方向:机器学习、数据挖掘。
基金资助:
国家自然科学基金重大项目（11991024）；重庆市高校创新研究群体项目（CXQT20015）；重庆市研究生科研创新项目（CYS20241）。

Abstract

Abstract: There would be lack of useful information in added unlabeled samples during the iterations of co-training algorithm, meanwhile, the labels of the samples labeled by multiple classifiers may happen to be inconsistent, which would lead to accumulation of classification errors. To solve the above problems, a co-training algorithm combining improved density peak clustering and shared subspace was proposed. Firstly, the two base classifiers were obtained by the complementation of attribute sets. Secondly, an improved density peak clustering was performed based on the siphon balance rule. And beginning from the cluster centers, the unlabeled samples with high mutual neighbor degrees were selected in a progressive manner, then they were labeled by the two base classifiers. Finally, the final categories of the samples with inconsistent labels were determined by the shared subspace obtained by the multi-view non-negative matrix factorization algorithm. In the proposed algorithm, the unlabeled samples with better representation of spatial structure were selected by the improved density peak clustering and mutual neighbor degree, and the same sample labeled by different labels was revised via shared subspace, solving the low classification accuracy problem caused by sample misclassification. The algorithm was validated by comparisons in multiple experiments on 9 UCI datasets, and experimental results show that the proposed algorithm has the highest classification accuracy rate in 7 data sets, and the second highest classification accuracy rate in the other 2 data sets.

Key words: co-training, density peak clustering, siphon balance rule, shared subspace, mutual neighbor degree

摘要： 针对协同训练算法在迭代过程中加入的无标记样本的有用信息不足和多分类器对样本标记不一致导致的分类错误累积问题，提出结合改进密度峰值聚类和共享子空间的协同训练算法。该算法先采取属性集合互补的方式得到两个基分类器，然后基于虹吸平衡法则进行改进密度峰值聚类，并从簇中心出发来推进式选择相互邻近度高的无标记样本交由两个基分类器进行分类，最后利用多视图非负矩阵分解算法得到的共享子空间来确定标记不一致样本的最终类别。该算法利用改进密度峰值聚类和相互邻近度选择出更具空间结构代表性的无标记样本，并采用共享子空间来修订标记不一致的样本，解决了因样本误分类造成的分类精度低的问题。在9个UCI数据集上的多组对比实验证明了该算法的有效性，实验结果表明所提算法相较于对比算法在7个数据集上取得最高的分类正确率，在另2个数据集取得次高的分类正确率。

关键词: 协同训练, 密度峰值聚类, 虹吸平衡法则, 共享子空间, 相互邻近度

CLC Number:

TP181

LYU Jia, XIAN Yan. Co-training algorithm combining improved density peak clustering and shared subspace[J]. Journal of Computer Applications, 2021, 41(3): 686-693.

吕佳, 鲜焱. 结合改进密度峰值聚类和共享子空间的协同训练算法[J]. 计算机应用, 2021, 41(3): 686-693.

References

[1] 屠恩美, 杨杰. 半监督学习理论及其研究进展概述[J]. 上海交通大学学报,2018,52(10):1280-1291.(TU E M,YANG J. A review of semi-supervised learning theories and recent advances[J]. Journal of Shanghai Jiao Tong University,2018,52(10):1280-1291.)
[2] GOUTTE C, CANCEDDA N, DYMETMAN M, et al. Semisupervised learning for machine translation[J]. Journal of the Royal Statistical Society,2017,172(2):530-530.
[3] HU T,HUANG X,LI J,et al. A novel co-training approach for urban land cover mapping with unclear Landsat time series imagery[J]. Remote Sensing of Environment,2018,217:144-157.
[4] LI X,LU H,YANG J,et al. Semi-supervised LIBS quantitative analysis method based on co-training regression model with selection of effective unlabeled samples[J]. Plasma Science and Technology,2019,21(3):No. 034015.
[5] XU Z,CAO Y,KANG Y. Deep spatiotemporal residual early-late fusion network for city region vehicle emission pollution prediction[J]. Neurocomputing,2019,355:183-199.
[6] ZHANG Z,HU Z,YANG H,et al. Factorization machines and deep views-based co-training for improving answer quality prediction in online health expert question-answering services[J]. Journal of Biomedical Informatics,2018,87:21-36.
[7] 付治, 王红军, 李天瑞, 等. 基于k个标记样本的弱监督学习框架[J]. 软件学报,2020,31(4):981-990.(FU Z,WANG H J,LI T R,et al. Weakly supervised learning framework based on k labeled samples[J]. Journal of Software,2020,31(4):981-990.)
[8] GAN H,LUO Z,FAN Y,et al. Enhanced manifold regularization for semi-supervised classification[J]. Journal of the Optical Society of America A:Optics,Image Science,and Vision,2016,33(6):1207-1213.
[9] WU D,SHANG M,LUO X,et al. Self-training semi-supervised classification based on density peaks of data[J]. Neurocomputing, 2018,275:180-191.
[10] 陈叶旺, 申莲莲, 钟才明, 等. 密度峰值聚类算法综述[J]. 计算机研究与发展,2020,57(2):378-394.(CHEN Y W,SHEN L L, ZHONG C M, et al. Survey on density peak clustering algorithm[J]. Journal of Computer Research and Development, 2020,57(2):378-394.)
[11] 马春来, 单洪, 马涛. 一种基于簇中心点自动选择策略的密度峰值聚类算法[J]. 计算机科学,2016,43(7):255-258,280. (MA C L,SHAN H,MA T. Improved density peaks based clustering algorithm with strategy choosing cluster center automatically[J]. Computer Science, 2016, 43(7):255-258,280.)
[12] 龚彦鹭, 吕佳. 结合主动学习和密度峰值聚类的协同训练算法[J]. 计算机应用,2019,39(8):2297-2301.(GONG Y L,LYU J. Co-training algorithm with combination of active learning and density peak clustering[J]. Journal of Computer Applications, 2019,39(8):2297-2301.)
[13] ZHAO J,LIU N,MALOV A. Safe semi-supervised classification algorithm combined with active learning sampling strategy[J]. Journal of Intelligent and Fuzzy Systems,2018,35(4):4001-4010.
[14] PENG J,AVED A J,SEETHARAMAN G,et al. Multiview boosting with information propagation for classification[J]. IEEE Transactions on Neural Networks and Learning Systems,2018,29(3):657-669.
[15] 赵嘉, 姚占峰, 吕莉, 等. 基于相互邻近度的密度峰值聚类算法[J/OL]. 控制与决策,[2020-06-13]. https://doi.org/10.13195/j.kzyjc.2019.0795. (ZHAO J,YAO Z F,LYU L, et al. Density peaks clustering based on mutual neighbor degree[J]. Control and Decision,[2020-06-13]. https://doi.org/10.13195/j.kzyjc.2019.0795.)
[16] TAN Q,YU G,WANG J,et al. Individuality- and commonalitybased multiview multilabel learning[J]. IEEE Transactions on Cybernetics,2019(Early Access):1-12.
[17] YUAN H,TANG Y Y. Spectral-spatial shared linear regression for hyperspectral image classification[J]. IEEE Transaction on Cybernetics,2017,47(4):934-945.
[18] XU N, GUO Y, WANG J, et al. Multi-view clustering via simultaneously learning shared subspace and affinity matrix[J]. International Journal of Advanced Robotic Systems,2017,14(6):1-8.
[19] RODRIGUEZ A,LAIO A. Clustering by fast search and find of density peaks[J]. Science,2014,344(6191):1492-1496.
[20] JI D Y,KIM S H,LEE K Y,et al. Experimental study of small scale siphon breaker to verify Siphon Breaker Simulation Program (SBSP)[J]. Annals of Nuclear Energy,2018,121:406-413.
[21] JIANG W,MA T,FENG X,et al. Robust semi-nonnegative matrix factorization with adaptive graph regularization for gene representation[J]. Chinese Journal of Electronics,2020,29(1):122-131.
[22] MEKTHANAVANH V,LI T,MENG H,et al. Social Web video clustering based on multi-view clustering via nonnegative matrix factorization[J]. International Journal of Machine Learning and Cybernetics,2019,10(10):2779-2790.
[23] CAI H,LIU B,XIAO Y,et al. Semi-supervised multi-view clustering based on orthonormality-constrained nonnegative matrix factorization[J]. Information Sciences,2020,536:171-184.

Co-training algorithm combining improved density peak clustering and shared subspace

结合改进密度峰值聚类和共享子空间的协同训练算法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 6

Recommended Articles

Metrics

[1]	GONG Yanlu, LYU Jia. Co-training algorithm with combination of active learning and density peak clustering [J]. Journal of Computer Applications, 2019, 39(8): 2297-2301.
[2]	DU Hangyuan, PEI Xiya, WANG Wenjian. Overlapping community detection algorithm for attributed networks [J]. Journal of Computer Applications, 2019, 39(11): 3151-3157.
[3]	LI Yanling, YAN Yonghong. Weakly-supervised training method about Chinese spoken language understanding [J]. Journal of Computer Applications, 2015, 35(7): 1965-1968.
[4]	YU Chongchong LIU Yu TAN Li SHANG Lili MA Meng. Multi-view semi-supervised collaboration classification algorithm with combination of agreement and disagreement label rules [J]. Journal of Computer Applications, 2013, 33(11): 3090-3093.
[5]	WU Shu-yue YU Jie FAN Xiao-ping. Improved SVM co-training based intrusion detection [J]. Journal of Computer Applications, 2011, 31(12): 3337-3339.
[6]	HE Hai-jiang LONG Yue-jin. Semi-supervised learning listwise ranking functions for document retrieval [J]. Journal of Computer Applications, 2011, 31(11): 3108-3111.