计算机应用 ›› 2015, Vol. 35 ›› Issue (6): 1611-1616.DOI: 10.11772/j.issn.1001-9081.2015.06.1611

• 人工智能 • 上一篇    下一篇

基于多图的交替优化图直推方法

修宇1, 王骏2, 王忠群1, 刘三民1   

  1. 1. 安徽工程大学 计算机与信息学院, 安徽 芜湖 241000;
    2. IBM T J Watson 研究中心, 纽约州 约克城高地 10598
  • 收稿日期:2014-12-31 修回日期:2015-04-01 发布日期:2015-06-12
  • 通讯作者: 修宇(1976-),男,山东单县人,讲师,硕士,CCF会员,主要研究方向:机器学习、数据挖掘;xiuyu1860@126.com
  • 作者简介:王骏(1976-),男,安徽芜湖人,研究员,博士,主要研究方向:机器学习、数据挖掘;王忠群(1965-),男,安徽芜湖人,教授,主要研究方向:商务智能;刘三民(1978-),男,安徽岳西人,副教授,博士,主要研究方向:模式识别、网络安全。
  • 基金资助:

    国家自然科学基金资助项目(71371012,71171002);教育部人文社科规划项目(13YJA630098);安徽省优秀青年人才基金重点项目(2013SQRL034Z);安徽省高校省级科学研究项目(TSKJ2014B10);安徽工程大学计算机应用技术重点实验室开放基金资助项目(JSJKF201510)。

Graph transduction via alternating minimization method based on multi-graph

XIU Yu1, WANG Jun2, WANG Zhongqun1, LIU Sanmin1   

  1. 1. College of Computer and Information Engineering, Anhui Polytechnic University, Wuhan Anhui 241000, China;
    2. IBM T. J. Watson Research Center, Yorktown Heights New York 10598, USA
  • Received:2014-12-31 Revised:2015-04-01 Published:2015-06-12

摘要:

针对基于单图的半监督学习(GSSL)算法的性能受单个图质量的影响,且在单视图数据下,大多数基于多图的GSSL算法难以使用的问题,提出了一种基于多图的交替优化图直推方法(MG-GTAM)。首先,使用不同的图构建参数来构建单视图数据下的多个图,利用多个图来表达数据间关系;然后,借助交替迭代方式综合多个图的信息,选择置信度高的未标记样本进行伪标记并通过权重权衡各图的重要程度,以优化多图上的预测函数的一致性和平滑性;最后通过组合每个图的预测函数完成对所有未标记样本的标记。仿真实验表明,与经典的局部和全局一致(LGC)、高斯随机场和调和函数(GFHF)、交替优化直推(GTAM)、组合图拉普拉斯(CGL)算法相比,在COIL20目标物体数据集和NEC Animal数据集上,MG-GTAM的分类错误率比这些经典算法均有下降,表明了该方法具有良好的性能。实验结果表明, MG-GTAM能有效地利用多个图来表达数据之间的关系,获得更低的分类错误率。

关键词: 图半监督学习, 图直推, 图构建, 多图, 交替优化

Abstract:

The performance of the Graph-based Semi-Supervised Learning (GSSL) method based on one graph mainly depends on a well-structured single graph and most algorithms based on multiple graphs are difficult to be applied while the data has only single view. Aiming at the issue, a Graph Transduction via Alternating Minimization method based on Multi-Graph (MG-GTAM) was proposed. Firstly, using different graph construction parameters, multiple graphs were constructed from data with one single view to represent data point relation. Secondly,the most confident unlabeled examples were chosen for pseudo label assignment through the integration of a plurality of map information and imposed higher weights to the most relevant graphs based on alternating optimization,which optimized agreement and smoothness of prediction function over multiple graphs. Finally, more accurate labels were given over the entire unlabeled examples by combining the predictions of all individual graphs. Compared with the classical algorithms of Local and Global Consistency (LGC), Gaussian Fields and Harmonic Functions (GFHF), Graph Transduction via Alternation Minimization (GTAM), Combined Graph Laplacian (CGL), the classification error rates of MG-GTAM decrease on data sets of COIL20 and NEC Animal. The experimental results show that the proposed method can efficiently represent data point relation with multiple graphs, and has lower classification error rate.

Key words: Graph-based Semi-Supervised Learning (GSSL), graph transduction, graph construction, multi-graph, alternating minimization

中图分类号: