Journal of Computer Applications ›› 2020, Vol. 40 ›› Issue (12): 3430-3436.DOI: 10.11772/j.issn.1001-9081.2020060893

• 2020 China Conference on Granular Computing and Knowledge Discovery(CGCKD 2020) • Previous Articles     Next Articles

Semi-supervised learning algorithm of graph based on label metric learning

LYU Yali1,2, MIAO Junzhong1, HU Weixin1   

  1. 1. School of Information, Shanxi University of Finance and Economics, Taiyuan Shanxi 030006, China;
    2. Key Laboratory of Computational Intelligence and Chinese Information Processing, Ministry of Education;(Shanxi University), Taiyuan Shanxi 030006, China
  • Received:2020-06-12 Revised:2020-08-20 Online:2020-12-10 Published:2020-10-20
  • Supported by:
    This work is partially supported by the Natural Science Foundation of Shanxi Province(201801D121115), the Research Project of Shanxi Scholarship Council of China (2020-095).

基于标签进行度量学习的图半监督学习算法

吕亚丽1,2, 苗钧重1, 胡玮昕1   

  1. 1. 山西财经大学 信息学院, 太原 030006;
    2. 计算智能与中文信息处理教育部重点实验室(山西大学), 太原 030006
  • 通讯作者: 吕亚丽(1975-),女,山西临汾人,副教授,博士,CCF会员,主要研究方向:数据挖掘、机器学习、概率推理。sxlvyali@126.com
  • 作者简介:苗钧重(1993-),男,山西晋中人,硕士研究生,主要研究方向:数据挖掘、机器学习;胡玮昕(1996-),女,山西晋中人,硕士研究生,主要研究方向:数据挖掘、机器学习
  • 基金资助:
    山西省自然科学基金资助项目(201801D121115);山西省回国留学人员科研资助项目(2020-095)。

Abstract: Most graph-based semi-supervised learning methods do not use the known label information and the label information obtained from the label propagation process when measuring the similarity between samples. At the same time, these methods have the measurement methods relatively fixed, which cannot effectively measure the similarity between data samples with complex and varied distribution structures. In order to solve the problems, a semi-supervised learning algorithm of graph based on label metric learning was proposed. Firstly, the similarity measurement method of samples was given, and then the similarity matrix was constructed. Secondly, labels were propagated based on the similarity matrix and k samples with low entropy were selected as the new obtained label information. Finally, the similarity measure method was updated by fully using all label information, and this process was repeated until all label information was learned. The proposed algorithm not only uses label information to improve the measurement method of similarity between samples, but also makes full use of intermediate results to reduce the demand for labeled data in the semi-supervised learning. Experimental results on six real datasets show that, compared with three traditional graph-based semi-supervised learning algorithms, the proposed algorithm achieves higher classification accuracy in more than 95% of the cases.

Key words: machine learning, graph-based semi-supervised learning, metric learning, label propagation, similarity matrix

摘要: 大多基于图的半监督学习方法,在样本间相似性度量时没有用到已有的和标签传播过程中得到的标签信息,同时,其度量方式相对固定,不能有效度量出分布结构复杂多样的数据样本间的相似性。针对上述问题,提出了基于标签进行度量学习的图半监督学习算法。首先,给定样本间相似性的度量方式,从而构建相似度矩阵。然后,基于相似度矩阵进行标签传播,筛选出k个低熵样本作为新确定的标签信息。最后,充分利用所有标签信息更新相似性度量方式,重复迭代优化直至学出所有标签信息。所提算法不仅利用标签信息改进了样本间相似性的度量方式,而且充分利用中间结果降低了半监督学习对标签数据的需求量。在6个真实数据集上的实验结果表明,该算法在超过95%的情况下相较三种传统的基于图的半监督学习算法取得了更高的分类准确率。

关键词: 机器学习, 图半监督学习, 度量学习, 标签传播, 相似度矩阵

CLC Number: