异构复合迁移学习的视频内容标注方法

doi:10.11772/j.issn.1001-9081.2017112815

计算机应用 ›› 2018, Vol. 38 ›› Issue (6): 1547-1553.DOI: 10.11772/j.issn.1001-9081.2017112815

异构复合迁移学习的视频内容标注方法

谭瑶¹, 饶文碧^1,2

1. 武汉理工大学计算机科学与技术学院, 武汉 430070;
2. 交通物联网技术湖北省重点实验室(武汉理工大学), 武汉 430070

收稿日期:2017-11-30 修回日期:2017-12-19 出版日期:2018-06-10 发布日期:2018-06-13
通讯作者: 饶文碧
作者简介:谭瑶(1992-),男,湖北武汉人,硕士研究生,主要研究方向:人工智能、机器学习、计算机视觉;饶文碧(1967-),女,湖北武汉人,教授,博士,主要研究方向:人工智能、机器学习、数据挖掘、普适计算。
基金资助:
国家自然科学基金资助项目（61702386）。

Heterogeneous compound transfer learning method for video content annotation

TAN Yao¹, RAO Wenbi^1,2

1. School of Computer Science and Technology, Wuhan University of Technology, Wuhan Hubei 430070, China;
2. Hubei Key Laboratory of Transportation Internet of Things(Wuhan University of Technology), Wuhan Hubei 430070, China

Received:2017-11-30 Revised:2017-12-19 Online:2018-06-10 Published:2018-06-13
Supported by:
This work is partially supported by the National Natural Science Foundation of China (61702386).

摘要/Abstract

摘要： 针对传统的机器学习需要大量的人工标注训练模型的弊端，以及目前多数迁移学习方法只适用于同构空间的问题，提出了一种异构复合迁移学习（HCTL）的视频内容标注方法。首先，借助视频与图像的对应关系，利用典型相关性分析（CCA）来实现图像域（源域）和视频域（目标域）特征空间的同构化；然后，基于这两个特征空间向共同空间投影的代价最小化这一思想，找到源域特征空间向目标域特征空间对齐的矩阵；最后，通过对齐矩阵使得源域特征能够翻译到目标域特征空间中去，进而实现知识迁移，完成视频内容标注任务。所提方法在Kodak数据库上的平均标注准确率达到了35.81%，与标准的支持向量机（S-SVM）领域适应支持向量机（DASVM）、异构直推式迁移学习（HTTL）、跨领域的结构化模型（CDSM）、领域选择机（DSM）、异构源域下的多领域适应（MDA-HS）和判别性相关分析（DCA）方法相比分别提高了58.03%、23.06%、45.04%、6.70%、15.52%、13.07%和6.74%；而在哥伦比亚用户视频（CCV）数据库上达到了20.73%，分别相对提高了133.71%、37.28%、14.34%、24.88%、16.40%、20.73%和12.48%。实验结果表明先同构再对齐的复合迁移思想在异构领域适应问题上能够有效地提升识别准确率。

关键词: 视频标注, 迁移学习, 领域适应, 异构空间, 子空间对齐

Abstract: The traditional machine learning has the disadvantage of requiring a large amount of manual annotation to train model and most current transfer learning methods are only applicable to isomorphic space. In order to solve the problems, a new Heterogeneous Compound Transfer Learning (HCTL) method for video content annotation was proposed. Firstly, based on the correspondence between video and image, Canonical Correlation Analysis (CCA) was applied to realize isomorphism of feature space between image domain (source domain) and video domain (target domain). Then, based on the idea of minimizing the cost of projection from this two feature spaces to a common space, a transformation matrix for aligning the feature space of source domain to the feature space of target domain was found. Finally, the features of source domain were translated into the feature space of target target domain by the alignment matrix, which realized the knowledge transfer and completed the video content annotation task. The mean annotation precision of the proposed HCTL on the Kodak database reaches 35.81%, which is 58.03%,23.06%, 45.04%, 6.70%, 15.52%, 13.07% and 6.74% higher than that of Standard Support Vector Machine (S_SVM), Domain Adaptation Support Vector Machine (DASVM), Heterogeneous Transductive Transfer Learning (HTTL), Cross Domain Structural Model (CDSM), Domain Selection Machine (DSM), Multi-domain Adaptation with Heterogeneous Sources (MDA-HS) and Discriminative Correlation Analysis (DCA) methods; while on the Columbia Consumer Video (CCV) database, it reaches 20.73% with the relative increase of 133.71%, 37.28%, 14.34%, 24.88%, 16.40%, 20.73% and 12.48% respectively. The experimental results show that the pre-homogeneous re-aligned compound transfer idea can effectively improve the recognition accuracy in the heterogeneous domain adaptation problems.

Key words: video annotation, transfer learning, domain adaptation, heterogeneous space, subspace alignment

中图分类号:

TP391.4

谭瑶, 饶文碧. 异构复合迁移学习的视频内容标注方法[J]. 计算机应用, 2018, 38(6): 1547-1553.

TAN Yao, RAO Wenbi. Heterogeneous compound transfer learning method for video content annotation[J]. Journal of Computer Applications, 2018, 38(6): 1547-1553.

参考文献

[1] NIEBLES J C, CHEN C W, LI F F. Modeling temporal structure of decomposable motion segments for activity classification[C]//Proceedings of the 201011th European Conference on Computer Vision. Berlin:Springer, 2010:392-405.
[2] NAPHADE M R. Statistical techniques in video data management[C]//Proceedings of the 2002 IEEE Workshop on Multimedia Signal Processing. Piscataway, NJ:IEEE, 2002:210-215.
[3] DUAN L, XU D, CHANG S F. Exploiting web images for event recognition in consumer videos:a multiple source domain adaptation approach[C]//Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ:IEEE, 2012:1338-1345.
[4] KULIS B, SAENKO K, DARRELL T. What you saw is not what you get:domain adaptation using asymmetric kernel transforms[C]//Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ:IEEE, 2011:1785-1792.
[5] GONG B Q, SHI Y, SHA F, et al. Geodesic flow kernel for unsupervised domain adaptation[C]//Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ:IEEE, 2012:2066-2073.
[6] WU X X, JIA Y D. View-Invariant action recognition using latent kernelized structural SVM[C]//Proceedings of the 2012 European Conference on Computer Vision, LNCS 7576. Berlin:Springer, 2012:411-424.
[7] CAO L L, LIU Z C, HUANG T S. Cross-dataset action detection[C]//Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ:IEEE, 2010:1998-2005.
[8] REDKO I, BENNANI Y. Non-negative embedding for fully unsupervised domain adaptation[J]. Pattern Recognition Letters, 2016, 77(C):35-41.
[9] FERNANDO B, HABRARD A, SEBBAN M, et al. Unsupervised visual domain adaptation using subspace alignment[C]//Proceedings of the 2013 IEEE International Conference on Computer Vision. Piscataway, NJ:IEEE, 2013:2960-2967.
[10] WANG H, WU X X, JIA Y D. Heterogeneous domain adaptation method for video annotation[J]. IET Computer Vision, 2017, 11(2):181-187.
[11] BEL N, KOSTER C H A, VILLEGAS M. Cross-lingual text categorization[C]//Proceedings of the 2003 International Conference on Theory and Practice of Digital Libraries, LNCS 2769. Berlin:Springer, 2003:126-139.
[12] RASIWASIA N, PEREIRA J C, COVIELLO E, et al. A new approach to cross-modal multimedia retrieval[C]//Proceedings of the 201018th ACM international conference on Multimedia. New York:ACM, 2010:251-260.
[13] HARDOON D R, SZEDMAK S R, SHAWE-TAYLOR J R. Canonical correlation analysis:an overview with application to learning methods[J]. Neural Computation, 2004, 16(12):2639-2664.
[14] 张博,史忠植,赵晓非,等.一种基于跨领域典型相关性分析的迁移学习方法[J].计算机学报,2015,38(7):1326-1336.(ZHANG B, SHI Z Z, ZHAO X F, et al. A transfer learning based on canonical correlation analysis across different domains[J]. Chinese Journal of Computers, 2015, 38(7):1326-1336.)
[15] 杨柳,景丽萍,于剑.一种异构直推式迁移学习算法[J].软件学报,2015,26(11):2762-2780.(YANG L, JING L P, YU J. Heterogeneous transductive transfer learning algorithm[J]. Journal of Software, 2015, 26(11):2762-2780.)
[16] 王晗.基于迁移学习的视频标注方法[D].北京:北京理工大学,2014:16-32.(WANG H. Video annotation method based on transfer learning[D]. Beijing:Beijing Institute of Technology, 2014:16-32.)
[17] PAN S J, YANG Q. A survey on transfer learning[J]. IEEE Transactions on Knowledge and Data Engineering, 2010, 22(10):1345-1359.
[18] YANG Z R, OJA E. Linear and nonlinear projective nonnegative matrix factorization[J]. IEEE Transactions on Neural Networks, 2010, 21(5):734-749.
[19] JIANG Y G, YE G N, CHANG S F, et al. Consumer video understanding:a benchmark database and an evaluation of human and machine performance[C]//Proceedings of the 20111st ACM International Conference on Multimedia Retrieval. New York:ACM, 2011:Article No. 29.
[20] BRUZZONE L, MARCONCINI M. Domain adaptation problems:a DASVM classification technique and a circular validation strategy[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(5):770-787.
[21] CHEN L, DUAN L X, XU D. Event recognition in videos by learning from heterogeneous Web sources[C]//Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ:IEEE, 2013:2666-2673.
[22] YAN Y G, LI W, NG M, et al. Learning discriminative correlation subspace for heterogeneous domain adaptation[C]//Proceedings of the 2017 Twenty-Sixth International Joint Conference on Artificial Intelligence. San Francisco, CA:Morgan Kaufmann, 2017:3252-3258.

异构复合迁移学习的视频内容标注方法

Heterogeneous compound transfer learning method for video content annotation

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	陈争涛, 黄灿, 杨波, 赵立, 廖勇. 基于迁移学习的并行卷积神经网络牦牛脸识别算法[J]. 计算机应用, 2021, 41(5): 1332-1336.
[2]	王锦凯, 贾旭. 基于迁移孪生非负矩阵分解的静脉识别算法[J]. 计算机应用, 2021, 41(3): 898-903.
[3]	刘晓龙, 王士同. 渐进式分离的开放集模糊域自适应算法[J]. 《计算机应用》唯一官方网站, 2021, 41(11): 3127-3131.
[4]	曹建芳, 闫敏敏, 贾一鸣, 田晓东. 融合迁移学习的Inception-v3模型在古壁画朝代识别中的应用[J]. 《计算机应用》唯一官方网站, 2021, 41(11): 3219-3227.
[5]	赵津, 宋文爱, 邰隽, 杨吉江, 王青, 李晓丹, 雷毅, 邱悦. 儿童阻塞性睡眠呼吸暂停计算机人脸辅助诊断综述[J]. 《计算机应用》唯一官方网站, 2021, 41(11): 3394-3401.
[6]	黄学雨, 徐浩特, 陶剑文. 具有特征选择的多源自适应分类框架[J]. 计算机应用, 2020, 40(9): 2499-2506.
[7]	张晨曦, 唐曙, 唐珂. 迁移学习下的火箭发动机参数异常检测策略[J]. 计算机应用, 2020, 40(9): 2774-2780.
[8]	白茹意, 郭小英, 贾春花. 基于特征融合的小样本抽象画图像情感预测[J]. 计算机应用, 2020, 40(8): 2207-2213.
[9]	马金林, 魏萌, 马自萍. 基于深度迁移学习的肺结节分割方法[J]. 计算机应用, 2020, 40(7): 2117-2125.
[10]	肖禾, 刘志勤, 王庆凤, 黄俊, 周莹, 刘启榆, 徐卫云. 基于迁移学习的多视角乳腺肿块和钙化簇分类方法[J]. 计算机应用, 2020, 40(5): 1460-1464.
[11]	陈杰, 张挺, 杜奕. 基于自适应深度迁移学习的多孔介质重构[J]. 计算机应用, 2020, 40(4): 1231-1236.
[12]	谌贵辉, 易欣, 李忠兵, 钱济人, 陈伍. 基于改进YOLOv2和迁移学习的管道巡检航拍图像第三方施工目标检测[J]. 计算机应用, 2020, 40(4): 1062-1068.
[13]	郑宗生, 胡晨雨, 姜晓轶. 基于改进的最大均值差异算法的深度迁移适配网络[J]. 计算机应用, 2020, 40(11): 3107-3112.
[14]	毛文涛, 杨超, 刘亚敏, 田思雨. 面向轴承早期故障检测的多尺度残差注意力深度领域适配模型[J]. 计算机应用, 2020, 40(10): 2890-2898.
[15]	武宽, 秦品乐, 柴锐, 曾建朝. 基于不同超声成像的甲状腺结节良恶性判别[J]. 计算机应用, 2020, 40(1): 77-82.