计算机应用 ›› 2019, Vol. 39 ›› Issue (12): 3462-3466.DOI: 10.11772/j.issn.1001-9081.2019050813

• 人工智能 • 上一篇    下一篇

结合支持向量机与半监督K-means的新型学习算法

杜阳, 姜震, 冯路捷   

  1. 江苏大学 计算机科学与通信工程学院, 江苏 镇江 212013
  • 收稿日期:2019-05-14 修回日期:2019-07-23 出版日期:2019-12-10 发布日期:2019-12-17
  • 作者简介:杜阳(1994-),男,江苏扬州人,硕士研究生,主要研究方向:机器学习;姜震(1976-),男,山东烟台人,副教授,博士,主要研究方向:机器学习;冯路捷(1996-),女,江苏淮安人,硕士研究生,主要研究方向:机器学习。
  • 基金资助:
    国家自然科学基金资助项目(61672268);江苏大学高级人才科研启动基金资助项目(14JDG036)。

Novel learning algorithm combining support vector machine and semi-supervised K-means

DU Yang, JIANG Zhen, FENG Lujie   

  1. College of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang Jiangsu 212013, China
  • Received:2019-05-14 Revised:2019-07-23 Online:2019-12-10 Published:2019-12-17
  • Contact: 姜震
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61672268), the Research Initiation Fund for Senior Talents of Jiangsu University (14JDG036).

摘要: 半监督学习结合少量有标签样本和大量无标签样本,可以有效提高算法的泛化性能。传统的半监督支持向量机(SVM)算法在目标函数中引入无标签样本的依赖项来推动决策面通过低密度区域,但往往会带来高计算复杂度和局部最优解等问题。同时,半监督K-means算法面临着如何有效利用监督信息进行质心的初始化及更新等问题。针对上述问题,提出了一种结合SVM和半监督K-means的新型学习算法(SKAS)。首先,提出一种改进的半监督K-means算法,从距离度量和质心迭代两个方面进行了改进;然后,设计了一种融合算法将半监督K-means算法与SVM相结合以进一步提升算法性能。在6个UCI数据集上的实验结果表明,所提算法在其中5个数据集上的运行结果都优于当前先进的半监督SVM算法和半监督K-means算法,且拥有最高的平均准确率。

关键词: 支持向量机, K-means, 半监督聚类, 分类, 融合

Abstract: Semi-supervised learning can effectively improve the generalization performance of algorithm by combining a few labeled samples and large number of unlabeled samples. The traditional semi-supervised Support Vector Machine (SVM) algorithm introduces unlabeled sample dependencies into the objective function to drive the decision-making surface through the low-density region, but it often brings problems such as high computational complexity and local optimal solution. At the same time, semi-supervised K-means algorithm faces the problems of how to effectively use the supervised information to initialize and update the centroid. To solve these problems, a novel learning algorithm of Semi-supervised K-means Assisted SVM (SKAS) was proposed. Firstly, an improved semi-supervised K-means algorithm was proposed, which was improved from two aspects:distance measurement and centroid iteration. Then, a fusion algorithm was designed to combine semi-supervised K-means algorithm with SVM in order to further improve the performance of the algorithm. The experimental results on six UCI datasets show that, the proposed method outperforms the current advanced semi-supervised SVM and semi-supervised K-means algorithms on five datasets and has the highest average accuracy.

Key words: Support Vector Machine (SVM), K-means, semi-supervised clustering, classification, fusion

中图分类号: