计算机应用 ›› 2019, Vol. 39 ›› Issue (4): 1021-1026.DOI: 10.11772/j.issn.1001-9081.2018081817

• 数据科学与技术 • 上一篇    下一篇

基于全局融合的多核概念分解算法

李飞1,2, 杜亮1,2,3, 任超宏1,2   

  1. 1. 山西大学 计算机与信息技术学院, 太原 030006;
    2. 山西大学 大数据科学与产业研究院, 太原 030006;
    3. 计算智能与中文信息处理教育部重点实验室(山西大学), 太原 030006
  • 收稿日期:2018-09-13 修回日期:2018-11-21 发布日期:2019-04-10 出版日期:2019-04-10
  • 通讯作者: 杜亮
  • 作者简介:李飞(1993-),男,湖北黄冈人,硕士研究生,CCF会员,主要研究方向:数据挖掘、机器学习;杜亮(1985-),男,山西晋中人,讲师,博士,CCF会员,主要研究方向:数据挖掘、机器学习;任超宏(1994-),男,山西朔州人,硕士研究生,主要研究方向:数据挖掘、机器学习。
  • 基金资助:
    国家自然科学基金资助项目(61502289)。

Multiple kernel concept factorization algorithm based on global fusion

LI Fei1,2, DU Liang1,2,3, REN Chaohong1,2   

  1. 1. School of Computer and Information Technology, Shanxi University, Taiyuan Shanxi 030006, China;
    2. Institute of Big Data Science and Industry, Shanxi University, Taiyuan Shanxi 030006, China;
    3. Key Laboratory of Computational Intelligence and Chinese Information Processing, Ministry of Education(Shanxi University), Taiyuan Shanxi 030006, China
  • Received:2018-09-13 Revised:2018-11-21 Online:2019-04-10 Published:2019-04-10
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61502289).

摘要: 非负矩阵分解(NMF)算法仅能用于对原始非负数据寻找低秩近似,而概念分解(CF)算法将矩阵分解模型扩展到单个非线性核空间,提升了矩阵分解算法的学习能力和普适性。针对无监督环境下概念分解面临的如何设计或选择合适核函数这一问题,提出基于全局融合的多核概念分解(GMKCF)算法。同时输入多种候选核函数,在概念分解框架下基于全局线性权重融合对它们进行学习,以得出质量高稳定性好的聚类结果,并解决概念分解模型面临核函数选择的问题。采用交替迭代的方法对新模型进行求解,证明了算法的收敛性。将该算法与基于核的K-均值(KKM)、谱聚类(SC)、KCF(Kernel Concept Factorization)、Coreg(Co-regularized multi-view spectral clustering)、RMKKM(Robust Multiple KKM)在多个真实数据库上的实验结果表明,该算法在数据聚类方面优于对比算法。

关键词: 多核学习, 概念分解, 矩阵分解, 多核聚类, 全局融合

Abstract: Non-negative Matrix Factorization (NMF) algorithm can only be used to find low rank approximation of original non-negative data while Concept Factorization (CF) algorithm extends matrix factorization to single non-linear kernel space, improving learning ability and adaptability of matrix factorization. In unsupervised environment, to design or select proper kernel function for specific dataset, a new algorithm called Globalized Multiple Kernel CF (GMKCF) was proposed. Multiple candidate kernel functions were input in the same time and learned in the CF framework based on global linear fusion, obtaining a clustering result with high quality and stability and solving the problem of kernel function selection that the CF faced. The convergence of the proposed algorithm was verified by solving the model with alternate iteration. The experimental results on several real databases show that the proposed algorithm outperforms comparison algorithms in data clustering, such as Kernel K-Means (KKM), Spectral Clustering (SC), Kernel CF (KCF), Co-regularized multi-view spectral clustering (Coreg), and Robust Multiple KKM (RMKKM).

Key words: multiple kernel learning, Concept Factorization (CF), matrix factorization, multiple kernel clustering, global fusion

中图分类号: