基于用户兴趣语义的视频关键帧提取

doi:10.11772/j.issn.1001-9081.2017.11.3139

计算机应用 ›› 2017, Vol. 37 ›› Issue (11): 3139-3144.DOI: 10.11772/j.issn.1001-9081.2017.11.3139

• 第十六届中国机器学习会议(CCML 2017) • 上一篇下一篇

基于用户兴趣语义的视频关键帧提取

俞璜悦, 王晗, 郭梦婷

北京林业大学信息学院, 北京 100083

收稿日期:2017-05-16 修回日期:2017-06-26 发布日期:2017-11-11 出版日期:2017-11-10
通讯作者: 王晗
作者简介:俞璜悦(1996-),女,江西南昌人,主要研究方向:数字图像处理、视频检索;王晗(1986-),女,湖南长沙人,讲师,博士,主要研究方向:视频图像检索、机器学习;郭梦婷(1996-),女,北京人,主要研究方向:图像处理、图像检索。
基金资助:
中央高校基本科研业务费专项资金资助项目（2015ZCQ-XX）。

Video keyframe extraction based on users' interests

YU Huangyue, WANG Han, GUO Mengting

College of Information Science and Technology, Beijing Forestry University, Beijing 100083, China

Received:2017-05-16 Revised:2017-06-26 Online:2017-11-11 Published:2017-11-10
Supported by:
This work is partially supported by the Fundamental Research Funds for the Central Universities (2015ZCQ-XX).

摘要/Abstract

摘要： 目前，视频关键信息提取技术主要集中于根据视频低层特征进行关键帧的提取，忽略了与用户兴趣相关的语义信息。对视频进行语义建模需收集大量已标注的视频训练样本，费时费力。为缓解这一问题，使用大量互联网图像数据构建基于用户兴趣的语义模型，这些图像数据内容丰富、同时涵盖大量事件信息；然而，从互联网获取的图像知识多样且常伴随图像噪声，使用蛮力迁移将大幅影响视频最终提取效果，提出使用近义词联合权重模型衡量互联网中存在差异但语义相近的图像组，并利用这些图像组构建语义模型。通过联合权重学习获取语义权重，每一图像组在知识迁移中所起的作用由权重值决定。使用来自不同视频网站的多段视频对所提方法进行验证，实验结果表明对用户感兴趣的内容进行联合权重语义建模能更加全面、准确地获取信息，从而有效指导视频关键帧提取。

关键词: 视频检索, 关键帧提取, 视频分析, 知识迁移

Abstract: At present, the video key information extraction technology mainly focuses on the extraction of key frames according to the characteristics of video low-level, and ignores the semantic information related to users' interests. Semantic modeling of video requires a large number of marked video training samples, which is time consuming and laborious. To alleviate this problem, a large amount of Internet image data was used to construct a semantic model based on users' interests, which was rich in content and covered a large amount of event information. However, the images obtained from the Internet were diversed and often accompanied by image noise, the final extraction of video would be greatly affected by brute force migration. The synonym-weight model was used to measure the differences of the semantically similar image groups on the Internet, and these image groups were used to construct a semantic model. The weight of each image group in knowledge migration was determined by the weight value. The experimental results on several challenging video datasets demonstrate that semantic modeling based on users' interests combined with weights is more comprehensive and accurate, so as to effectively guide the video key frame extraction.

Key words: video retrieval, keyframe extraction, video analysis, knowledge transfer

中图分类号:

TP391.41

俞璜悦, 王晗, 郭梦婷. 基于用户兴趣语义的视频关键帧提取[J]. 计算机应用, 2017, 37(11): 3139-3144.

YU Huangyue, WANG Han, GUO Mengting. Video keyframe extraction based on users' interests[J]. Journal of Computer Applications, 2017, 37(11): 3139-3144.

参考文献

[1] WOLF W H. Key frame selection by motion analysis[C]//Proceedings of the 1996 IEEE Conference on Acoustics, Speech, and Signal Processing. Washington, DC:IEEE Computer Society, 1996:1228-1231.
[2] ZHANG H, WU J, ZHONG D, et al. An integrated system for content-based video retrieval and browsing[J]. Pattern Recognition, 1997, 30(4):643-658.
[3] LU Z, GRAUMAN K. Story-driven summarization for egocentric video[C]//Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ:IEEE, 2013:2714-2721.
[4] YAO T, MEI T, NGO C, et al. Annotation for free:video tagging by mining user search behavior[C]//Proceedings of the 21st ACM International Conference on Multimedia. New York:ACM, 2013:977-986.
[5] EL SAYAD I, MARTINET J, URRUTY T, et al.A semantically significant visual representation for social image retrieval[C]//Proceedings of the 2011 IEEE International Conference on Multimedia and Expo. Washington, DC:IEEE Computer Society, 2011:1-6.
[6] 王晗,吴心筱,贾云得. 使用异构互联网图像组的视频标注[J]. 计算机学报,2013,36(10):2062-2069.(WANG H, WU X X, JIA Y D. Video annotation by using heterogeneous multiple image groups on the Web[J].Chinese Journal of Computers, 2013,36(10):2062-2069.)
[7] 王晗. 基于迁移学习的视频标注方法[D]. 北京:北京理工大学, 2014.(WANG H. Video annotation based on transfer learning[D]. Beijing:Beijing Institute of Technology, 2014.)
[8] WANG H, WU X. Finding event videos via image search engine[C]//Proceedings of the 2015 IEEE International Conference on Data Mining Workshop. Washington, DC:IEEE Computer Society, 2015:1221-1228.
[9] WANG H, WU X, JIA Y. Video Annotation via image groups from the Web[J]. IEEE Transactions on Multimedia, 2014, 16(5):1282-1291.
[10] WANG H, SONG H, WU X, et al. Video annotation by incremental learning from grouped heterogeneous sources[C]//Proceedings of the 12th Asian Conference on Computer Vision. Berlin:Springer, 2014:493-507.
[11] 余春艳,翁子林.音频情感感知与视频精彩片段提取[J].计算机辅助设计与图形学学报, 2015, 27(10):1890-1899.(YU C Y, WENG Z L. Audio emotion perception and video highlight extraction[J].Journal of Computer Aided Design and Computer Graphics,2015,27(10):1890-1899.)
[12] ZHANG K, CHAO W, SHA F, et al. Summary transfer:exemplar-based subset selection for video summarization[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ:IEEE, 2016:1059-1067.
[13] YAO T, MEI T, RUI Y. Highlight detection with pairwise deep ranking for first-person video summarization[C]//Proceedings 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ:IEEE, 2016:982-990.
[14] LOWE D G. Distinctive image features from scale-invariant keypoints[J]. International Journal of Computer Vision, 2004, 60(2):91-110.
[15] HOIEM D,EFROS A, HEBERT M. Recovering surface layout from an image[J]. International Journal of Computer Vision, 2007,75(1):151-172.
[16] OLIVA A, TORRALBA A. Modeling the shape of the scene:a holistic representation of the spatial envelope[J]. International Journal of Computer Vision, 2001, 42(3):145-175.
[17] SWAIN M J, BALLARD D H. Indexing via color histograms[C]//Proceedings of the 3rd International Conference on Computer Vision. Piscataway, NJ:IEEE, 1990:390-393.
[18] MEI T, TANG L, TANG J, et al. Near-lossless semantic video summarization and its applications to video analysis[J]. ACM Transactions on Multimedia Computing, Communications, and Applications, 2013, 9(3):Article No. 16.
[19] PLATT J C, CRISTIANINI N, SHAWE-TAYLOR J. Large margin DAGs for multiclass classification[J]. Advances in Neural Information Processing Systems, 2000, 12(3):547-553.
[20] FERNANDO B, HABRARD A, SEBBAN M, et al. Unsupervised visual domain adaptation using subspace alignment[C]//Proceedings of the 2013 IEEE International Conference on Computer Vision. Piscataway, NJ:IEEE, 2013:2960-2967.
[21] GRAUMAN K. Geodesic flow kernel for unsupervised domain adaptation[C]//Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ:IEEE, 2012:2066-2073.
[22] MENG J, WANG H, YUAN J, et al. From keyframes to key objects:video summarization by representative object proposal selection[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ:IEEE, 2016:1039-1048.

基于用户兴趣语义的视频关键帧提取

Video keyframe extraction based on users' interests

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	赵佳伟, 陈雪峰, 冯亮, 候亚庆, 朱泽轩, Yew‑Soon Ong. 优化场景视角下的进化多任务优化综述[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1325-1337.
[2]	王晓兵, 张雄伟, 曹铁勇, 郑云飞, 王勇. 基于尺度注意知识迁移的自蒸馏目标分割方法[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 129-137.
[3]	柏财通, 崔翛龙, 郑会吉, 李爱. 基于自监督知识迁移的鲁棒性语音识别技术[J]. 《计算机应用》唯一官方网站, 2022, 42(10): 3217-3223.
[4]	魏淳武, 赵涓涓, 唐笑先, 强彦. 基于多时期蒸馏网络的随访数据知识提取方法[J]. 计算机应用, 2021, 41(10): 2871-2878.
[5]	石念峰, 侯小静, 张平. 时空特征局部保持的运动视频关键帧提取[J]. 计算机应用, 2017, 37(9): 2605-2609.
[6]	侯荣波, 魏武, 黄婷, 邓超锋. 基于ORB-SLAM的室内机器人定位和三维稠密地图构建[J]. 计算机应用, 2017, 37(5): 1439-1444.
[7]	郑併斌, 范新南, 李敏, 张继. 基于轨迹分段LDA主题模型的视频异常行为检测方法[J]. 计算机应用, 2015, 35(2): 515-518.
[8]	王松韩永国吴亚东张赛楠. 基于图像主色彩的视频关键帧提取方法[J]. 计算机应用, 2013, 33(09): 2631-2635.
[9]	周渝斌. 海量监控视频快速回放与检索技术[J]. 计算机应用, 2012, 32(11): 3185-3197.
[10]	张建明蒋兴杰李广翠姜靓. 基于粒子群的关键帧提取算法[J]. 计算机应用, 2011, 31(02): 358-361.
[11]	吴渝贾学鹏李红波. 基于多特征相似度曲线曲率检测的关键帧提取[J]. 计算机应用, 2008, 28(12): 3084-3088.
[12]	张静俞辉. 一种多模态信息融合的视频检索模型[J]. 计算机应用, 2008, 28(1): 199-201,.
[13]	张培珍; 江华俊; 沈玉利. 自适应块匹配搜索算法研究[J]. 计算机应用, 2006, 26(4): 797-798.
[14]	李争名肖国强江健民 . 基于宏块类型信息的自适应场景变换检测算法[J]. 计算机应用, 2006, 26(11): 2727-2729.
[15]	刘宏哲，鲍泓，须德. 基于内容的视频分层语义联想模型[J]. 计算机应用, 2005, 25(08): 1797-1780.