计算机应用 ›› 2016, Vol. 36 ›› Issue (10): 2875-2879.DOI: 10.11772/j.issn.1001-9081.2016.10.2875

• 虚拟现实与数字媒体 • 上一篇    下一篇

基于改进时空兴趣点特征的双人交互行为识别

王佩瑶1, 曹江涛1, 姬晓飞2   

  1. 1. 辽宁石油化工大学 信息与控制工程学院, 辽宁 抚顺 113001;
    2. 沈阳航空航天大学 自动化学院, 沈阳 110136
  • 收稿日期:2016-03-14 修回日期:2016-07-04 发布日期:2016-10-10
  • 通讯作者: 姬晓飞,E-mail:jixiaofei7804@126.com
  • 作者简介:王佩瑶(1991—),女,辽宁沈阳人,硕士研究生,主要研究方向:视频分析、模式识别;曹江涛(1978—),男,山东郓城人,教授,博士,主要研究方向:智能控制、视频分析;姬晓飞(1978—),女,辽宁鞍山人,副教授,博士,主要研究方向:视频分析、模式识别。
  • 基金资助:
    国家自然科学基金资助项目(61103123);辽宁省高等学校优秀人才支持计划项目(LJQ2014018,LR2015034)。

Two-person interaction recognition based on improved spatio-temporal interest points

WANG Peiyao1, CAO Jiangtao1, JI Xiaofei2   

  1. 1. School of Information and Control Engineering, Liaoning Shihua University, Fushun Liaoning 113001, China;
    2. School of Automation, Shenyang Aerospace University, Shenyang Liaoning 110136, China
  • Received:2016-03-14 Revised:2016-07-04 Published:2016-10-10
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61103123), the Program for Liaoning Excellent Talents in University (LJQ2014018,LR2015034).

摘要: 针对实际监控视频下双人交互行为的兴趣点特征选取不理想,且聚类词典中冗余单词导致识别率不高的问题,提出一种基于改进时空兴趣点(STIP)特征的交互行为识别方法。首先,引入基于信息熵的不可跟踪性检测方法,对序列图像进行跟踪得到交互动作的前景运动区域,仅在此区域内提取时空兴趣点以提高兴趣点检测的准确性。其次采用3维尺度不变特性转换(3D-SIFT)描述子对检测得到的兴趣点进行表述,利用改进的模糊C均值聚类方法得到视觉词典,以提升词典的分布特性;在此基础上建立词袋模型,即将训练集样本向词典进行投影得到每帧图像的直方图统计特征表示。最后,采用帧帧最近邻分类方法进行双人交互动作识别。在UT-interaction数据库上进行测试,该算法得到了91.7%的正确识别率。实验结果表明,通过不可跟踪性检测得到的时空兴趣点的改进词袋算法可以较大程度提高交互行为识别的准确率,并且适用于动态背景下的双人交互行为识别。

关键词: 时空兴趣点, 信息熵, 双人交互行为识别, 词袋模型, 模糊C均值, 3维尺度不变特性转换, 最近邻分类器

Abstract: Concerning the problem of unsatisfactory feature extraction and low recognition rate caused by redundant words in clustering dictionary in the practical monitoring video for two-person interaction recognition, a Bag Of Word (BOW) model based on improved Spatio-Temporal Interest Point (STIP) feature was proposed. First of all, foreground movement area of interaction was detected in the image sequences by the intractability method of information entropy, then the STIPs were extracted and described by 3-Dimensional Scale-Invariant Feature Transform (3D-SIFT) descriptor in detected area to improve the accuracy of the detection of interest points. Second, the BOW model was built by using the improved Fuzzy C-Means (FCM) clustering method to get the dictionary, and the representation of the training video was obtained based on dictionary projection. Finally, the nearest neighbor classification method was chosen for the two-person interaction recognition. Experimental results showed that compared with the recent STIPs feature algorithm, the improved method with intractability detection achieved 91.7% of recognition rate. The simulation results demonstrate that the intractability detection method combined with improved BOW model can greatly improve the accuracy of two-person interaction recognition, and it is suitable for dynamic background.

Key words: Spatio-Temporal Interest Point (STIP), information entropy, two-person interaction recognition, Bag Of Word (BOW) model, Fuzzy C-Means (FCM), 3-Dimensional Scale-Invariant Feature Transform (3D-SIFT), nearest neighbor classifier

中图分类号: