Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (6): 1972-1977.DOI: 10.11772/j.issn.1001-9081.2023060767

Special Issue: 前沿与综合应用

• Frontier and comprehensive applications • Previous Articles     Next Articles

3D object detection network based on self-attention mechanism and graph convolution

Yue LIU(), Fang LIU, Aoyun WU, Qiuyue CHAI, Tianxiao WANG   

  1. School of Electrical and Electronic Engineering,Changchun University of Technology,Changchun Jilin 130012,China
  • Received:2023-06-19 Revised:2023-08-09 Accepted:2023-08-15 Online:2023-09-11 Published:2024-06-10
  • Contact: Yue LIU
  • About author:LIU Fang, born in 1998, M. S. candidate. Her research interests include computer vision, object detection.
    WU Aoyun, born in 1996, M. S. candidate. His research interests include computer vision, object detection.
    CHAI Qiuyue, born in 1998, M. S. candidate. Her research interests include deep learning, image processing.
    WANG Tianxiao, born in 1997, M. S. candidate. His research interests include deep learning, image processing.
  • Supported by:
    Scientific and Technological Developing Plan of Jilin Province(20220204090YY)

基于自注意力机制与图卷积的3D目标检测网络

刘越(), 刘芳, 武奥运, 柴秋月, 王天笑   

  1. 长春工业大学 电气与电子工程学院,长春 130012
  • 通讯作者: 刘越
  • 作者简介:刘芳(1998—),女,山东济宁人,硕士研究生,主要研究方向:计算机视觉、目标检测
    武奥运(1996—),男,安徽阜阳人,硕士研究生,主要研究方向:计算机视觉、目标检测
    柴秋月(1998—),女,河北衡水人,硕士研究生,主要研究方向:深度学习、图像处理
    王天笑(1997—),男,安徽宿州人,硕士研究生,主要研究方向:深度学习、图像处理。
  • 基金资助:
    吉林省科技发展计划项目(20220204090YY)

Abstract:

Aiming at the problems that the detection accuracy of small objects such as cyclists and pedestrians in Three-Dimensional (3D) object detection is low, and it is difficult to adapt to complex urban road conditions, a 3D object detection network based on self-attention mechanism and graph convolution was proposed. Firstly, in order to obtain more discriminative small object features, self-attention mechanism was introduced into the backbone network to make the network more sensitive to small object features and improve the ability to extract network features. Secondly, a feature fusion module was constructed based on the self-attention mechanism to further enrich the information of shallow network and enhance the feature expression ability of deep network. Finally, dynamic graph convolution was used to predict the boundary box of the object, improving the accuracy of object prediction. The proposed network was tested on KITTI dataset, and compared to eight major networks such as TANet (Triple Attention Network) and IA-SSD (Instance-Aware Single-Stage Detector). The experimental results show that the pedestrian detection accuracy of the proposed network is increased by 12.12, 13.82 and 11.03 percentage points compared with TANet, which has the suboptimal pedestrian detection accuracy, under three difficulty levels of simple, medium,and difficult degrees; the cyclist detection accuracy of the proposed network is 3.06 and 5.34 percentage points higher than that of IA-SSD under medium and difficult degrees. In summary, the network proposed in this paper can be better applied to small object detection tasks.

Key words: Three-Dimensional (3D) object detection, self-attention mechanism, feature fusion, dynamic graph convolution, small object detection

摘要:

针对三维(3D)目标检测过程中对骑行者、行人等小目标检测的准确性较低,难以适应城市复杂路况的问题,提出一种基于自注意力机制与图卷积的3D目标检测网络。首先,为获取更具有判别性的小目标特征,在主干网络中引入自注意力机制,使网络对小目标特征更敏感,增强网络特征的提取能力;其次,在自注意力机制的基础上构建特征融合模块,进一步丰富浅层网络特征,增强深层网络的特征表达能力;最后,引用动态图卷积预测目标的边界框,提高目标预测的准确性。在KITTI数据集上进行实验,将所提网络与TANet(Triple Attention Network)、IA-SSD(Instance-Aware Single-Stage Detector)等8种主流网络对比。实验结果表明,所提网络对行人的检测精度在简单、中等和困难这3个难度下比行人检测精度次优的TANet提高了12.12、13.82和11.03个百分点,对骑行者的检测精度在中等和困难上比IA-SSD提高了3.06和5.34个百分点。综上所述,所提网络可以更好地应用于小目标检测任务。

关键词: 三维目标检测, 自注意力机制, 特征融合, 动态图卷积, 小目标检测

CLC Number: