Journal of Computer Applications
Next Articles
Received:
Revised:
Online:
Published:
马骋昊1,林垠2,3,陈叶瀚森3,殷保才3,高建清3
通讯作者:
基金资助:
Abstract: With the advancement of intelligent human-computer interaction technology, gesture recognition based on 2D images has been widely applied in various fields. Existing methods decompose gesture recognition into two independent stages: hand tracking and gesture classification. Firstly, the region and motion trajectory of the hand are determined, and then the corresponding image region is cropped for category classification. However, these methods heavily rely on the performance of the front-end model (such as hand detection), and in multi-person scenarios, the computational cost increases linearly with the number of hands to be recognized, failing to achieve a good balance between efficiency and effectiveness. To address these issues, an Efficient Query-Based Multi-Object Gesture Recognition (EQMGR) algorithm was proposed. This method can accomplish end-to-end multi-object gesture recognition tasks by setting multiple learnable query vectors combined with an attention mechanism. Each query can adaptively focus on a specific person in the entire image, enabling the recognition of all objects' gestures in the image with single inference. Furthermore, through inter-frame propagation of queries, the query vectors can model the temporal features of objects with no additional computational cost, thereby achieving high-performance recognition for both dynamic and static gestures. To validate the effectiveness of this method on multi-object dynamic and static gesture recognition tasks, a multi-object gesture recognition dataset is collected and annotated. Experimental results on this dataset show that the proposed EQMGR algorithm achieves an 93.2% precision rate and an 96.1% recall rate, while reaching an inference speed of 25.2 frames per second(FPS)on a single GPU, demonstrating efficient and accurate gesture recognition.
Key words: multi-object gesture recognition, efficient dynamic and static gesture recognition, learnable query, frame-wise query propagation
摘要: 随着人机交互技术的智能化,基于二维图像的手势识别被广泛应用于各个领域。现有方法将手势识别分解为手部跟踪和手势分类两个独立阶段,先确定手的区域位置和运动轨迹,再截取对应图像区域进行类别判定。这种方法极度依赖前端模型(如手部检测)的效果,在多人场景中的计算开销也会随着待识别手的数量增加而线性增加,无法很好地平衡识别效率与效果。为了解决上述问题,提出了一种基于可学习查询向量的高效多目标手势识别(EQMGR)方法。该方法能够实现端到端的多目标手势检测任务,通过设置多个可学习查询向量,结合注意力机制,每个查询向量能够自适应地关注整张图像中一个特定的交互人,仅需一次推理即可完成图中所有对象的手势识别。此外,通过查询向量帧间传递操作,查询向量能够以零额外计算开销建模对象的时序特征,从而实现高精度动、静态手势交互。为了验证该算法在多目标动静态手势识别任务上的效果,采集并标注了一个多目标手势识别数据集。在该数据集上的实验结果表明,EQMGR算法识别精确率达到93.2%,召回率达到96.1%,同时在单块GPU上的推理速度达到25.2 frames/s,实现了高效准确的手势识别。
关键词: 多目标手势识别, 高效的动静态手势识别, 可学习查询向量, 查询向量帧间传递
CLC Number:
TP391. 4
马骋昊 林垠 陈叶瀚森 殷保才 高建清. 基于可学习查询向量的高效多目标手势识别方法[J]. 《计算机应用》唯一官方网站, DOI: 10.11772/j.issn.1001-9081.2024111577.
0 / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2024111577