Abstract:VoxelNet is the first end-to-end object detection model based on point cloud. Taken only point cloud data as input, it has good effect. However, in VoxelNet, taking point cloud data of full scene as input makes more computation resources use on background point cloud data, and error detection and missing detection are easy to occur in complex scenes because the point cloud with only geometrical information has low recognition granularity on the targets. In order to solve these problems, an improved VoxelNet model with view frustum added was proposed. Firstly, the targets of interest were located by the RGB front view image. Then, the dimension increase was performed on the 2D targets, making the targets into the spatial view frustum. And the view frustum candidate region was extracted in the point cloud to filter out the redundant point cloud, only the point cloud within view frustum candidate region was calculated to obtain the detection results. Compared with VoxelNet, the improved algorithm reduces computation complexity of point cloud, and avoids the calculation of background point cloud data, so as to increase the efficiency of detection. At the same time, it avoids the disturbance of redundant background points and decreases the error detection rate and missing detection rate. The experimental results on KITTI dataset show that the improved algorithm outperforms VoxelNet in 3D detection with 67.92%, 59.98%, 53.95% average precision at easy, moderate and hard level.
[1] DENG Z, LATECKI L J. Amodal detection of 3D objects:inferring 3D bounding boxes from 2D ones in RGB-depth images[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2017:398-406. [2] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2014:580-587. [3] HE K, ZHANG X, REN S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9):1904-1916. [4] GIRSHICK R. Fast R-CNN[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway:IEEE, 2015:1440-1448. [5] REN S, HE K, GIRSHICK R, et al. Faster R-CNN:towards realtime object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6):1137-1149. [6] LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2017:936-944. [7] GUPTA S, GIRSHICK R, ARBELÁEZ P, et al. Learning rich features from RGB-D images for object detection and segmentation[C]//Proceedings of the 13th European Conference on Computer Vision, LNCS 8695. Cham:Springer, 2014:345-360. [8] CHEN X, KUNDU K, ZHANG Z, et al. Monocular 3D object detection for autonomous driving[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2016:2147-2156. [9] CHEN X, KUNDU K, ZHU Y, et al. 3D object proposals using stereo imagery for accurate object class detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(5):1259-1272. [10] LI B, ZHANG T, XIA T. Vehicle detection from 3D lidar using fully convolutional network[EB/OL].[2019-11-12]. https://arxiv.org/pdf/1608.07916.pdf. [11] ENGELCKE M, RAO D, WANG D Z, et al. Vote3Deep:fast object detection in 3D point clouds using efficient convolutional neural networks[C]//Proceedings of the 2017 IEEE International Conference on Robotics and Automation. Piscataway:IEEE, 2017:1355-1361. [12] LI B. 3D fully convolutional network for vehicle detection in point cloud[C]//Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway:IEEE, 2017:1513-1518. [13] WANG D Z, POSNER I. Voting for voting in online point cloud object detection[C]//Proceedings of the Robotics:Science and Systems. Rome, Italy, Robotics:Science and Systems Foundation, 2015:1-8. [14] QI C R, SU H, KAICHUN M, et al. PointNet:deep learning on point sets for 3D classification and segmentation[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2017:77-85. [15] QI C R, YI L, SU H, et al. PointNet++:deep hierarchical feature learning on point sets in a metric space[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook, NY:Curran Associates Inc., 2017:5099-5108. [16] CHEN X, MA H, WAN J, et al. Multi-view 3D object detection network for autonomous driving[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2017:6526-6534. [17] ZHOU Y, TUZEL O. VoxelNet:end-to-end learning for point cloud based 3D object detection[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2018:4490-4499. [18] GEIGER A, LENZ P, URTASUN R. Are we ready for autonomous driving? The KITTI vision benchmark suite[C]//Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2012:3354-3361. [19] QI C R, LIU W, WU C, et al. Frustum PointNets for 3D object detection from RGB-D data[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2018:918-927. [20] SONG S, XIAO J. Deep sliding shapes for amodal 3D object detection in RGB-D images[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2016:808-816. [21] 骆健,蒋旻,刘星,等. 基于多模态深度学习的RGB-D物体识别[J]. 计算机工程与设计, 2017, 38(6):1624-1629.(LUO J, JIANG M, LIU X, et al. RGB-D object recognition based on multimodal deep learning[J]. Computer Engineering and Design, 2017, 38(6):1624-1629.) [22] 王旭娇,马杰,王楠楠,等. 基于图卷积网络的深度学习点云分类模型[J]. 激光与光电子学进展, 2019, 56(21):No. 211004. (WANG X J, MA J, WANG N N, et al. Deep learning model for point clouds classification based on graph convolutional network[J]. Laser and Optoelectronics Progress, 2019, 56(21):No. 211004.)