计算机应用 ›› 2021, Vol. 41 ›› Issue (3): 721-726.DOI: 10.11772/j.issn.1001-9081.2020060958

所属专题: 人工智能

• 人工智能 • 上一篇    下一篇

基于低秩行为信息和多尺度卷积神经网络的人体行为识别方法

蒋丽, 黄仕建, 严文娟   

  1. 长江师范学院 电子信息工程学院, 重庆 408100
  • 收稿日期:2020-07-06 修回日期:2020-10-11 出版日期:2021-03-10 发布日期:2021-01-15
  • 通讯作者: 蒋丽
  • 作者简介:蒋丽(1990-),女,重庆合川人,助教,硕士,主要研究方向:计算机视觉、智能信息处理;黄仕建(1983-),男,四川仁寿人,副教授,博士,主要研究方向:计算机视觉、机器学习;严文娟(1976-),女,重庆人,副教授,硕士,主要研究方向:机器学习、模式识别。
  • 基金资助:
    重庆市教委科学技术研究计划项目(KJQN202001421)。

Human action recognition method based on low-rank action information and multi-scale convolutional neural network

JIANG Li, HUANG Shijian, YAN Wenjuan   

  1. School of Electronic Information Engineering, Yangtze Normal University, Chongqing 408100, China
  • Received:2020-07-06 Revised:2020-10-11 Online:2021-03-10 Published:2021-01-15
  • Supported by:
    This work is partially supported by the Scientific and Technological Research Program of Chongqing Municipal Education Commission (KJQN202001421).

摘要: 针对人体行为识别中传统行为信息获取方法需要繁琐步骤和各类假设的问题,结合卷积神经网络(CNN)在图像视频处理中的优越性能,提出了一种基于低秩行为信息(LAI)和多尺度卷积神经网络(MCNN)的人体行为识别方法。首先,对行为视频进行分段,并分别对每个视频段进行低秩学习以提取到相应的LAI,然后在时间轴上对这些LAI进行连接以获取整个视频的LAI,进而有效捕获视频中的行为信息,避免了繁琐的提取步骤和各类假设。其次,针对LAI的特点,设计了MCNN模型。该模型通过多尺度卷积核获取不同感受野下的LAI行为特征,并合理设计各卷积层、池化层及全连接层来进一步提炼特征并最终输出行为类别。将所提出的方法在KTH和HMDB51两个基准数据库上进行性能验证,同时设计和进行了三组对比实验。实验结果表明,所提方法在两个数据库上分别取得了97.33%和72.05%的识别率,与双重变换(TFT)方法和深时间嵌入网络(DTEN)方法相比,识别率分别至少提高了0.67和1.15个百分点。所提方法能进一步促进行为识别技术在安防、人机交互等领域的广泛应用。

关键词: 行为识别, 低秩学习, 行为信息, 多尺度, 卷积神经网络

Abstract: In view of the problem that traditional methods of action information acquisition in human action recognition need cumbersome steps and various assumptions, and considering the superior performance of Convolutional Neural Network (CNN) in image and video processing, a human action recognition method based on Low-rank Action Information (LAI) and Multi-scale Convolutional Neural Network (MCNN) was proposed. Firstly, the action video was divided into several segments, and the LAI of each segment was extracted by the low-rank learning of this segment, then the LAI of all segments was connected together on the time axis to obtain the LAI of the whole video, which effectively captured the action information in the video, so as to avoid cumbersome extraction steps and various assumptions. Secondly, according to the characteristics of LAI, an MCNN model was designed. In the model, the multi-scale convolution kernels were used to obtain the action characteristics of LAI under different receptive fields, and the reasonable design of each convolution layer, pooling layer and fully connected layer were utilized to further refine the characteristics and finally output the action categories. The performance of the proposed method was verified on two benchmark databases KTH and HMDB51, and three groups of comparison experiments were designed and carried out. Experimental results show that the recognition rates of the proposed method are 97.33% and 72.05% respectively on the two databases, which are at least increased by 0.67 and 1.15 percentage points respectively compared with those of the methods of Two-Fold Transformation (TFT) and Deep Temporal Embedding Network (DTEN). The proposed method can further promote the wide application of action recognition technology in security, human-computer interaction and other fields.

Key words: action recognition, low-rank learning, action information, multi-scale, Convolutional Neural Network (CNN)

中图分类号: