Journal of Computer Applications
Next Articles
Received:
Revised:
Online:
Published:
颜承志1,陈颖2,钟凯3,高寒3
通讯作者:
基金资助:
Abstract: In 3D object detection, the detection accuracy of small targets such as pedestrians and cyclists remains low, presenting a challenging issue within perception systems of autonomous vehicles. To accurately estimate the state of surrounding environment and enhance driving safety, this study introduces an improved 3D object detection algorithm based on a multi-scale network and axial attention within the framework of Voxel-RCNN (Voxel Region-based Convolutional Neural Network). Firstly, a multi-scale network and a Pixel-level Fusion Module (PFM) were integrated into backbone network to obtain richer and more precise feature representations, enhancing robustness and generalization in complex scenarios. Secondly, an axial attention mechanism, tailored for 3D spatial feature dimensions, was applied to Region of Interest (RoI) multi-scale pooling features. This approach effectively captures both local and global features while preserving essential information in 3D spatial structure, thereby improving accuracy and efficiency of object detection and classification. Finally, incorporating a Rotation-Decoupled Intersection over Union (RDIoU) method into regression and classification branches, enabling network to learn more precise bounding boxes and overcoming alignment issues between classification and regression. According to experimental results on KITTI public dataset, the proposed algorithm achieved mean Average Precision (mAP) values of 62.25% for pedestrians and 79.36% for cyclists. These results represent improvements of 4.02 and 3.15 percentage points, respectively, compared to baseline algorithm, Voxel-RCNN. This demonstrates the effectiveness of improved algorithm in detecting hard-to-perceive targets.
Key words: 3D object detection, multiscale network, feature fusion, axial attention, loss function
摘要: 在3D目标检测中小目标诸如行人、骑行者的检测精确度较低,而这是自动驾驶感知系统中所存在的挑战性的问题。为了准确估计周围环境的状态从而提高行车安全,对Voxel-RCNN(Voxel Region-based Convolutional Neural Network)算法进行改进,提出一种基于多尺度网络与轴向注意力的3D目标检测算法。首先,在主干网络中构建多尺度网络和像素级融合模块(PFM),获取更丰富和精准的特征表示,增强其在复杂场景的鲁棒性和泛化能力;其次,设计适用具有3D空间维度特征的轴向注意力,应用于感兴趣区域(RoI)多尺度池化特征,有效捕捉局部和全局特征的同时保留3D空间结构中的重要信息,提升了目标检测和分类的精度和效率;最后,将一种旋转解耦的交并比(RDIoU)方法纳入回归和分类分支,使网络学习更精确的边界框,并克服分类和回归之间的对齐问题。根据KITTI公开数据集上的实验结果表明,所提算法在行人和骑行者的平均精度均值(mAP)达到了62.25%、79.36%,与基准算法Voxel-RCNN相比提高了4.02、3.15个百分点,验证了改进算法在难感知目标检测上的有效性。
关键词: 3D目标检测, 多尺度网络, 特征融合, 轴向注意力, 损失函数
CLC Number:
TP391.41
颜承志 陈颖 钟凯 高寒. 基于多尺度网络与轴向注意力的3D目标检测算法[J]. 《计算机应用》唯一官方网站, DOI: 10.11772/j.issn.1001-9081.2024071058.
0 / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2024071058