计算机应用 ›› 2015, Vol. 35 ›› Issue (8): 2233-2237.DOI: 10.11772/j.issn.1001-9081.2015.08.2237

• 人工智能 • 上一篇    下一篇

基于主题模型的多示例多标记学习方法

严考碧1, 李志欣1,2, 张灿龙1,2   

  1. 1. 广西师范大学 广西多源信息挖掘与安全重点实验室, 广西 桂林 541004;
    2. 广西信息科学实验中心, 广西 桂林 541004
  • 收稿日期:2015-03-27 修回日期:2015-05-30 出版日期:2015-08-10 发布日期:2015-08-14
  • 通讯作者: 李志欣(1971-),男,广西桂林人,副教授,博士,CCF会员,主要研究方向:机器学习、图像理解,lizx@gxnu.edu.cn
  • 作者简介:严考碧(1988-),男,江西赣州人,硕士研究生,主要研究方向:机器学习、图像理解; 张灿龙(1975-),男,湖南娄底人,副教授,博士,主要研究方向:模式识别、图像目标跟踪。
  • 基金资助:

    国家自然科学基金资助项目(61165009,61262005,61363035,61365009);国家973计划项目(2012CB326403);广西自然科学基金资助项目(2012GXNSFAA053219,2013GXNSFAA019345,2014GXNSFAA118368)。

Multi-instance multi-label learning method based on topic model

YAN Kaobi1, LI Zhixin1,2, ZHANG Canlong1,2   

  1. 1. Guangxi Key Laboratory of Multi-Source Information Mining and Security, Guangxi Normal University, Guilin Guangxi 541004, China;
    2. Guangxi Experiment Center of Information Science, Guilin Guangxi 541004, China
  • Received:2015-03-27 Revised:2015-05-30 Online:2015-08-10 Published:2015-08-14

摘要:

针对现有的大部分多示例多标记(MIML)算法都没有考虑如何更好地表示对象特征这一问题,将概率潜在语义分析(PLSA)模型和神经网络(NN)相结合,提出了基于主题模型的多示例多标记学习方法。算法通过概率潜在语义分析模型学习到所有训练样本的潜在主题分布,该过程是一个特征学习的过程,用于学习到更好的特征表达,用学习到的每个样本的潜在主题分布作为输入来训练神经网络。当给定一个测试样本时,学习测试样本的潜在主题分布,将学习到的潜在主题分布输入到训练好的神经网络中,从而得到测试样本的标记集合。与两种经典的基于分解策略的多示例多标记算法相比,实验结果表明提出的新方法在现实世界中的两种多示例多标记学习任务中具有更优越的性能。

关键词: 主题模型, 特征表达, 多示例多标记学习, 场景分类, 文本分类

Abstract:

Concerning that most of the current methods for Multi-Instance Multi-Label (MIML) problem do not consider how to represent features of objects in an even better way, a new MIML approach combined with Probabilistic Latent Semantic Analysis (PLSA) model and Neural Network (NN) was proposed based on topic model. The proposed algorithm learned the latent topic allocation of all the training examples by using the PLSA model. The above process was equivalent to the feature learning for getting a better feature expression. Then it utilized the latent topic allocation of each training example to train the neural network. When a test example was given, the proposed algorithm learned its latent topic distribution, then regarded the learned latent topic allocation of the test example as an input of the trained neural network to get the multiple labels of the test example. The experimental results on comparison with two classical algorithms based on decomposition strategy show that the proposed method has superior performance on two real-world MIML tasks.

Key words: topic model, feature expression, multi-instance multi-label learning, scene classification, text categorization

中图分类号: