《计算机应用》唯一官方网站

• •    下一篇

基于自注意力机制与图卷积的3D目标检测网络

刘越,刘芳,武奥运,柴秋月,王天笑   

  1. 长春工业大学(南湖校区)
  • 收稿日期:2023-06-16 修回日期:2023-08-09 发布日期:2023-09-11 出版日期:2023-09-11
  • 通讯作者: 刘芳

3D object detection network based on self-attention mechanism and graph convolution

  • Received:2023-06-16 Revised:2023-08-09 Online:2023-09-11 Published:2023-09-11

摘要: 针对三维(3D)目标检测过程中对骑行者、行人等小目标检测的准确性较低,难以适应城市复杂路况的问题,提出一种基于自注意力机制与图卷积的3D目标检测网络。首先,为获取更具有判别性的小目标特征,在主干网络中引入自注意力机制,使网络对小目标特征更敏感,提升网络特征的提取能力;其次,在自注意力机制的基础上构建特征融合模块,进一步丰富浅层网络特征,增强深层网络的特征表达能力;最后,引用动态图卷积预测目标的边界框,提高本网络目标预测的准确性。本网络在KITTI数据集进行实验,与TANet(Triple Attention Network)、IA-SSD(Instance-Aware Single-Stage Detector)等8种主流网络相比。实验对比表明,在所对比的8个网络中,TANet的行人检测精度和IA-SSD的骑行者检测精度最优,本文网络的行人的检测精度在三个难度下比TANet提高了12.12、13.82和11.03个百分点,骑行者的检测精度在中等和困难上比IA-SSD提高了3.06和5.34个百分点。综上所述,本文所提网络可以更好的应用于小目标检测任务。

关键词: 三维目标检测, 自注意力机制, 特征融合, 动态图卷积, 小目标检测

Abstract: Aiming at the problem that the accuracy of detection of small objects such as cyclists and pedestrians in Three-Dimensional (3D) object detection is low, and it is difficult to adapt to complex urban road conditions, a 3D object detection network based on self-attention mechanism and graph convolution was proposed. Firstly, in order to obtain more discriminative small object features, self-attention mechanism was introduced into the backbone network to make the network more sensitive to small object features. Improve the ability to extract network features; Secondly, a feature fusion module was constructed based on the self-attention mechanism to further enrich the information of shallow network and enhance the feature expression ability of deep network. Finally, the dynamic graph convolution was used to convolve the boundary box of the prediction object to improve the accuracy of the prediction. This Network is tested on KITTI dataset, Compared to eight major networks such as TANet (Triple Attention Network) and IA-SSD (Instance-Aware Single-Stage Detector). The experimental comparison shows that among the 8 networks compared, the pedestrian detection accuracy of TANet and the cyclist detection accuracy of IA-SSD are the best. The pedestrian detection accuracy of the network in this paper is increased by 12.12, 13.82 and 11.03 percentage points compared with TANet under the three difficulties. The detection accuracy of riders is 3.06 and 5.34 percentage points higher than that of IA-SSD on medium and difficult points. In summary, the network proposed in this paper can be better applied to small object detection tasks.

Key words: Three-Dimensional (3D) object detection, self-attention mechanism, feature fusion, dynamic graph convolution, small object detection

中图分类号: