计算机应用 ›› 2016, Vol. 36 ›› Issue (3): 770-773.DOI: 10.11772/j.issn.1001-9081.2016.03.770

• 虚拟现实与数字媒体 • 上一篇    下一篇

基于拓扑独立成分分析和高斯混合模型的视频语义概念检测

孔玮婷, 詹永照   

  1. 江苏大学 计算机科学与通信工程学院, 江苏 镇江 212013
  • 收稿日期:2015-08-24 修回日期:2015-10-20 出版日期:2016-03-10 发布日期:2016-03-17
  • 通讯作者: 孔玮婷
  • 作者简介:孔玮婷(1991-),女,江苏南京人,硕士研究生,主要研究方向:多媒体;詹永照(1962-),男,江苏镇江人,教授,博士,CCF高级会员,主要研究方向:人机交互、模式识别、多媒体。
  • 基金资助:
    国家自然科学基金资助项目(61170126)。

Video semantic detection based on topographic independent component analysis and Gaussian mixture model

KONG Weiting, ZHAN Yongzhao   

  1. College of Computer Science and Telecommunication Engineering, Jiangsu University, Zhenjiang Jiangsu 212013, China
  • Received:2015-08-24 Revised:2015-10-20 Online:2016-03-10 Published:2016-03-17
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61170126).

摘要: 针对目前词袋模型(BoW)视频语义概念检测方法中的量化误差问题,为了更有效地自动提取视频的底层特征,提出一种基于拓扑独立成分分析(TICA)和高斯混合模型(GMM)的视频语义概念检测算法。首先,通过TICA算法进行视频片段的特征提取,该特征提取算法能够学习到视频片段复杂不变性特征;其次利用GMM方法对视频视觉特征进行建模,描述视频特征的分布情况;最后构造视频片段的GMM超向量,采用支持向量机(SVM)进行视频语义概念检测。GMM是BoW概率框架下的拓展,能够减少量化误差,具有良好的鲁棒性。在TRECVID 2012和OV两个视频库上,将所提方法与传统的BoW、SIFT-GMM方法进行了对比实验,结果表明,基于TICA和GMM的视频语义概念检测方法能够提高视频语义概念检测的准确率。

关键词: 视频语义检测, 拓扑独立成分分析, 高斯混合模型, 词袋模型, 支持向量机

Abstract: To reduce quantization error in vector quantization of Bag of Words (BoW) for video semantic detection and extract feature automatically and effectively, a new video semantic detection method based on Topographic Independent Component Analysis (TICA) and Gaussian Mixture Model (GMM) was proposed. Firstly, features of each video clip were extracted by TICA algorithm to learn complex invariant features from video clips. Secondly, the feature distribution of each video clip was described by GMM. Finally, a GMM supervector was created from GMM parameters and the GMM supervector for each shot was used as the input of an Support Vector Machine (SVM) for video semantic detection. A GMM can be regard as an extension of the BoW to a probabilistic framework, and thus, has less quantization error, better retaining the information in the original feature vectors. The experiments were conducted on the TRECVID 2012 and OV datasets. The experimental results show that compared with BoW and SIFT (Scale Invariant Feature Transform)-GMM algorithm, the proposed method can improve the mean average precision on both of the TRECVID 2012 and OV datasets for video semantic detection.

Key words: video semantic detection, Topographic Independent Component Analysis (TICA), Gaussian Mixture Model (GMM), Bag of Words (BoW) model, Support Vector Machine (SVM)

中图分类号: