计算机应用 ›› 2018, Vol. 38 ›› Issue (6): 1547-1553.DOI: 10.11772/j.issn.1001-9081.2017112815

• 人工智能 • 上一篇    下一篇

异构复合迁移学习的视频内容标注方法

谭瑶1, 饶文碧1,2   

  1. 1. 武汉理工大学 计算机科学与技术学院, 武汉 430070;
    2. 交通物联网技术湖北省重点实验室(武汉理工大学), 武汉 430070
  • 收稿日期:2017-11-30 修回日期:2017-12-19 出版日期:2018-06-10 发布日期:2018-06-13
  • 通讯作者: 饶文碧
  • 作者简介:谭瑶(1992-),男,湖北武汉人,硕士研究生,主要研究方向:人工智能、机器学习、计算机视觉;饶文碧(1967-),女,湖北武汉人,教授,博士,主要研究方向:人工智能、机器学习、数据挖掘、普适计算。
  • 基金资助:
    国家自然科学基金资助项目(61702386)。

Heterogeneous compound transfer learning method for video content annotation

TAN Yao1, RAO Wenbi1,2   

  1. 1. School of Computer Science and Technology, Wuhan University of Technology, Wuhan Hubei 430070, China;
    2. Hubei Key Laboratory of Transportation Internet of Things(Wuhan University of Technology), Wuhan Hubei 430070, China
  • Received:2017-11-30 Revised:2017-12-19 Online:2018-06-10 Published:2018-06-13
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61702386).

摘要: 针对传统的机器学习需要大量的人工标注训练模型的弊端,以及目前多数迁移学习方法只适用于同构空间的问题,提出了一种异构复合迁移学习(HCTL)的视频内容标注方法。首先,借助视频与图像的对应关系,利用典型相关性分析(CCA)来实现图像域(源域)和视频域(目标域)特征空间的同构化;然后,基于这两个特征空间向共同空间投影的代价最小化这一思想,找到源域特征空间向目标域特征空间对齐的矩阵;最后,通过对齐矩阵使得源域特征能够翻译到目标域特征空间中去,进而实现知识迁移,完成视频内容标注任务。所提方法在Kodak数据库上的平均标注准确率达到了35.81%,与标准的支持向量机(S-SVM)领域适应支持向量机(DASVM)、异构直推式迁移学习(HTTL)、跨领域的结构化模型(CDSM)、领域选择机(DSM)、异构源域下的多领域适应(MDA-HS)和判别性相关分析(DCA)方法相比分别提高了58.03%、23.06%、45.04%、6.70%、15.52%、13.07%和6.74%;而在哥伦比亚用户视频(CCV)数据库上达到了20.73%,分别相对提高了133.71%、37.28%、14.34%、24.88%、16.40%、20.73%和12.48%。实验结果表明先同构再对齐的复合迁移思想在异构领域适应问题上能够有效地提升识别准确率。

关键词: 视频标注, 迁移学习, 领域适应, 异构空间, 子空间对齐

Abstract: The traditional machine learning has the disadvantage of requiring a large amount of manual annotation to train model and most current transfer learning methods are only applicable to isomorphic space. In order to solve the problems, a new Heterogeneous Compound Transfer Learning (HCTL) method for video content annotation was proposed. Firstly, based on the correspondence between video and image, Canonical Correlation Analysis (CCA) was applied to realize isomorphism of feature space between image domain (source domain) and video domain (target domain). Then, based on the idea of minimizing the cost of projection from this two feature spaces to a common space, a transformation matrix for aligning the feature space of source domain to the feature space of target domain was found. Finally, the features of source domain were translated into the feature space of target target domain by the alignment matrix, which realized the knowledge transfer and completed the video content annotation task. The mean annotation precision of the proposed HCTL on the Kodak database reaches 35.81%, which is 58.03%,23.06%, 45.04%, 6.70%, 15.52%, 13.07% and 6.74% higher than that of Standard Support Vector Machine (S_SVM), Domain Adaptation Support Vector Machine (DASVM), Heterogeneous Transductive Transfer Learning (HTTL), Cross Domain Structural Model (CDSM), Domain Selection Machine (DSM), Multi-domain Adaptation with Heterogeneous Sources (MDA-HS) and Discriminative Correlation Analysis (DCA) methods; while on the Columbia Consumer Video (CCV) database, it reaches 20.73% with the relative increase of 133.71%, 37.28%, 14.34%, 24.88%, 16.40%, 20.73% and 12.48% respectively. The experimental results show that the pre-homogeneous re-aligned compound transfer idea can effectively improve the recognition accuracy in the heterogeneous domain adaptation problems.

Key words: video annotation, transfer learning, domain adaptation, heterogeneous space, subspace alignment

中图分类号: