Search Result

Journals

Publication Years

Keywords

Please wait a minute...

For Selected:

Download Citations
EndNote Ris BibTeX

Toggle Thumbnails

Select

3D object detection algorithm based on multi-scale network and axial attention

Chengzhi YAN, Ying CHEN, Kai ZHONG, Han GAO

Journal of Computer Applications 2025, 45 (8): 2537-2545. DOI: 10.11772/j.issn.1001-9081.2024071058

Abstract （42）

HTML （0）

PDF （2618KB）（8）

Save

In 3D object detection， the detection accuracy of small targets such as pedestrians and cyclists remains low， presenting a challenging issue to perception systems of autonomous vehicles. To estimate the state of surrounding environment accurately and enhance driving safety， a 3D object detection algorithm based on a multi-scale network and axial attention was proposed after improving Voxel R-CNN （Voxel Region-based Convolutional Neural Network） algorithm. Firstly， a multi-scale network and a Pixel-level Fusion Module （PFM） were constructed in the backbone network to obtain richer and more precise feature representations， thereby enhancing robustness and generalization of the algorithm in complex scenarios. Secondly， an axial attention mechanism， tailored for 3D spatial dimension features， was designed and applied to Region of Interest （RoI） multi-scale pooling features， so as to capture both local and global features effectively while preserving essential information in 3D spatial structure， thereby improving accuracy and efficiency of object detection and classification of the algorithm. Finally， a Rotation-Decoupled Intersection over Union （RDIoU） method was brought into regression and classification branches， thereby enabling network to learn more precise bounding boxes and addressing alignment issue between classification and regression. Experimental results on KITTI public dataset show that the proposed algorithm achieves the mean Average Precision （mAP） of 62.25% for pedestrians and 79.36% for cyclists， which are improved by 4.02 and 3.15 percentage points， respectively， compared to baseline algorithm Voxel R-CNN， demonstrating the effectiveness of the improved algorithm in detecting hard-to-perceive objects.

Table and Figures | Reference | Related Articles | Metrics