计算机应用 ›› 2019, Vol. 39 ›› Issue (12): 3482-3489.DOI: 10.11772/j.issn.1001-9081.2019061056

• 人工智能 • 上一篇    下一篇

基于三维残差稠密网络的人体行为识别算法

郭明祥1,2, 宋全军1, 徐湛楠1, 董俊1, 谢成军1   

  1. 1. 中国科学院 合肥智能机械研究所, 合肥 230031;
    2. 中国科学技术大学, 合肥 230026
  • 收稿日期:2019-06-21 修回日期:2019-09-14 发布日期:2019-10-15 出版日期:2019-12-10
  • 作者简介:郭明祥(1995-),男,四川达州人,硕士研究生,主要研究方向:计算机视觉、智能机器人;宋全军(1974-),男,安徽宿州人,教授,博士,主要研究方向:服务机器人、智能人机交互;徐湛楠(1987-),男,河南南阳人,助理研究员,硕士,主要研究方向:机器人智能控制、人机交互;董俊(1973-),男,安徽合肥人,副研究员,博士,CCF会员,主要研究方向:计算机视觉、人工智能;谢成军(1979-),男,安徽全椒人,副研究员,博士,主要研究方向:计算机视觉、模式识别、机器学习。
  • 基金资助:
    国家重点研发计划项目(2017YFC0806504);安徽省科技强警项目(201904d07020007)。

Human behavior recognition algorithm based on three-dimensional residual dense network

GUO Mingxiang1,2, SONG Quanjun1, XU Zhannan1, DONG Jun1, XIE Chengjun1   

  1. 1. Institute of Intelligent Machines, Chinese Academy of Sciences, Hefei Anhui 230031, China;
    2. University of Science and Technology of China, Hefei Anhui 230026, China
  • Received:2019-06-21 Revised:2019-09-14 Online:2019-10-15 Published:2019-12-10
  • Contact: 宋全军
  • Supported by:
    This work is partially supported by the National Key Research and Development Program of China (2017YFC0806504), the Science and Technology Strong Police Project of Anhui Province (201904d07020007).

摘要: 针对现有的人体行为识别算法不能充分利用网络多层次时空信息的问题,提出了一种基于三维残差稠密网络的人体行为识别算法。首先,所提算法使用三维残差稠密块作为网络的基础模块,模块通过稠密连接的卷积层提取人体行为的层级特征;其次,经过局部特征聚合自适应方法来学习人体行为的局部稠密特征;然后,应用残差连接模块来促进特征信息流动以及减轻训练的难度;最后,通过级联多个三维残差稠密块实现网络多层局部特征提取,并使用全局特征聚合自适应方法学习所有网络层的特征用以实现人体行为识别。设计的网络算法在结构上增强了对网络多层次时空特征的提取,充分利用局部和全局特征聚合学习到更具辨识力的特征,增强了模型的表达能力。在基准数据集KTH和UCF-101上的大量实验结果表明,所提算法的识别率(top-1精度)分别达到了93.52%和57.35%,与三维卷积神经网络(C3D)算法相比分别提升了3.93和13.91个百分点。所提算法框架有较好的鲁棒性和迁移学习能力,能够有效地处理多种视频行为识别任务。

关键词: 人体行为识别, 视频分类, 三维残差稠密网络, 深度学习, 特征聚合

Abstract: Concerning the problem that the existing algorithm for human behavior recognition cannot fully utilize the multi-level spatio-temporal information of network, a human behavior recognition algorithm based on three-dimensional residual dense network was proposed. Firstly, the proposed network adopted the three-dimensional residual dense blocks as the building blocks, these blocks extracted the hierarchical features of human behavior through the densely-connected convolutional layer. Secondly, the local dense features of human behavior were learned by the local feature aggregation adaptive method. Thirdly, residual connection module was adopted to facilitate the flow of feature information and mitigate the difficulty of training. Finally, after realizing the multi-level local feature extraction by concatenating multiple three-dimensional residual dense blocks, the aggregation adaptive method for global feature was proposed to learn the features of all network layers for realizing human behavior recognition. In conclusion, the proposed algorithm has improved the extraction of network multi-level spatio-temporal features and the features with high discrimination are learned by local and global feature aggregation, which enhances the expression ability of model. The experimental results on benchmark datasets KTH and UCF-101 show that, the recognition rate (top-1 recognition accuracy) of the proposed algorithm can achieve 93.52% and 57.35% respectively, which outperforms that of Three-Dimensional Convolutional neural network (C3D) algorithm by 3.93 percentage points and 13.91 percentage points respectively. The proposed algorithm framework has excellent robustness and migration learning ability, and can effectively handle multiple video behavior recognition tasks.

Key words: human behavior recognition, video classification, three-dimensional residual dense network, deep learning, feature aggregation

中图分类号: