Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (7): 2208-2215.DOI: 10.11772/j.issn.1001-9081.2023070990
• Multimedia computing and computer simulation • Previous Articles Next Articles
Xun SUN1, Ruifeng FENG2(), Yanru CHEN2
Received:
2023-07-21
Revised:
2023-09-26
Accepted:
2023-09-28
Online:
2023-10-26
Published:
2024-07-10
Contact:
Ruifeng FENG
About author:
SUN Xun, born in 1972, senior engineer.Supported by:
通讯作者:
冯睿锋
作者简介:
孙逊(1972—)女,湖北武汉人,高级工程师,主要研究方向:场站智能化、物流规划、场站设计;基金资助:
CLC Number:
Xun SUN, Ruifeng FENG, Yanru CHEN. Monocular 3D object detection method integrating depth and instance segmentation[J]. Journal of Computer Applications, 2024, 44(7): 2208-2215.
孙逊, 冯睿锋, 陈彦如. 基于深度与实例分割融合的单目3D目标检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2208-2215.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2023070990
项目 | 配置或版本 |
---|---|
CPU | Intel Xeon Gold 5320 |
内存 | 120 GB |
GPU | A30×2 |
系统 | Ubuntu 20.04 |
CUDA | 11.4 |
Tab. 1 Experimental configuration
项目 | 配置或版本 |
---|---|
CPU | Intel Xeon Gold 5320 |
内存 | 120 GB |
GPU | A30×2 |
系统 | Ubuntu 20.04 |
CUDA | 11.4 |
方法 | AP3D | APBEV | ||||
---|---|---|---|---|---|---|
Easy | Mod | Hard | Easy | Mod | Hard | |
Mono3D[ | 2.53 | 2.31 | 2.31 | 5.22 | 5.19 | 4.13 |
MF3D[ | 10.53 | 5.69 | 5.39 | 22.03 | 13.63 | 11.60 |
MoNoGRNet[ | 13.88 | 10.19 | 7.62 | 24.97 | 19.44 | 16.30 |
M3D-RPN[ | 20.27 | 17.06 | 15.21 | 25.94 | 21.18 | 17.90 |
D4LCN[ | 22.32 | 16.20 | 12.30 | 31.53 | 22.58 | 17.87 |
SMOKE[ | 14.76 | 12.85 | 11.50 | 19.99 | 15.61 | 15.28 |
MonoDistill[ | 18.05 | 14.98 | 13.42 | 24.26 | 18.43 | 16.95 |
本文方法 | 24.91 | 21.03 | 17.28 | 33.40 | 25.03 | 19.80 |
Tab. 2 Performance comparison of different methods based on aerial view and 3D bounding box (IoU≥0.7)
方法 | AP3D | APBEV | ||||
---|---|---|---|---|---|---|
Easy | Mod | Hard | Easy | Mod | Hard | |
Mono3D[ | 2.53 | 2.31 | 2.31 | 5.22 | 5.19 | 4.13 |
MF3D[ | 10.53 | 5.69 | 5.39 | 22.03 | 13.63 | 11.60 |
MoNoGRNet[ | 13.88 | 10.19 | 7.62 | 24.97 | 19.44 | 16.30 |
M3D-RPN[ | 20.27 | 17.06 | 15.21 | 25.94 | 21.18 | 17.90 |
D4LCN[ | 22.32 | 16.20 | 12.30 | 31.53 | 22.58 | 17.87 |
SMOKE[ | 14.76 | 12.85 | 11.50 | 19.99 | 15.61 | 15.28 |
MonoDistill[ | 18.05 | 14.98 | 13.42 | 24.26 | 18.43 | 16.95 |
本文方法 | 24.91 | 21.03 | 17.28 | 33.40 | 25.03 | 19.80 |
方法 | AP3D | APBEV | ||||
---|---|---|---|---|---|---|
Easy | Mod | Hard | Easy | Mod | Hard | |
基线 | 18.28 | 14.67 | 13.38 | 26.38 | 19.88 | 16.27 |
+① | 19.41 | 16.49 | 14.41 | 28.12 | 21.30 | 17.54 |
+①+④ | 22.18 | 16.63 | 15.57 | 28.96 | 23.07 | 17.87 |
+①+② | 20.89 | 14.92 | 14.12 | 27.42 | 21.81 | 16.82 |
+①+②+③ | 22.39 | 17.64 | 15.77 | 29.16 | 23.22 | 18.73 |
+①+②+③+④ | 24.91 | 21.03 | 17.28 | 33.40 | 25.03 | 19.80 |
Tab. 3 Ablation study results (IoU≥0.7)
方法 | AP3D | APBEV | ||||
---|---|---|---|---|---|---|
Easy | Mod | Hard | Easy | Mod | Hard | |
基线 | 18.28 | 14.67 | 13.38 | 26.38 | 19.88 | 16.27 |
+① | 19.41 | 16.49 | 14.41 | 28.12 | 21.30 | 17.54 |
+①+④ | 22.18 | 16.63 | 15.57 | 28.96 | 23.07 | 17.87 |
+①+② | 20.89 | 14.92 | 14.12 | 27.42 | 21.81 | 16.82 |
+①+②+③ | 22.39 | 17.64 | 15.77 | 29.16 | 23.22 | 18.73 |
+①+②+③+④ | 24.91 | 21.03 | 17.28 | 33.40 | 25.03 | 19.80 |
1 | MATURANA D, SCHERER S. VoxNet: a 3D convolutional neural network for real-time object recognition [C]// Proceedings of the 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway: IEEE, 2015: 922-928. |
2 | QI C R, LIU W, WU C, et al. Frustum PointNets for 3D object detection from RGB-D data [C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 918-927. |
3 | 周静,胡怡宇,胡成玉,等.基于点云补全和多分辨Transformer的弱感知目标检测方法[J].计算机应用, 2023, 43(7): 2155-2165. |
ZHOU J, HU Y Y, HU C Y, et al. Weakly perceived object detection method based on point cloud completion and multi-resolution Transformer [J]. Journal of Computer Applications, 2023, 43(7): 2155-2165. | |
4 | WANG Y, CHAO W-L, GARG D, et al. Pseudo-LiDAR from visual depth estimation: bridging the gap in 3D object detection for autonomous driving [C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 8445-8453. |
5 | LI P, CHEN X, SHEN S. Stereo R-CNN based 3D object detection for autonomous driving [C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 7636-7644. |
6 | WENG X, KITANI K. Monocular 3D object detection with pseudo-lidar point cloud [C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshop. Piscataway: IEEE, 2019: 857-866. |
7 | 王凤随,熊磊,钱亚萍.联合实例深度的多尺度单目3D目标检测算法[J].激光与光电子学进展, 2023, 60(16): 1612002. |
WANG F S, XIONG L, QIAN Y P. Multiscale monocular three-dimensional object detection algorithm incorporating instance depth [J]. Laser & Optoelectronics Progress, 2023, 60(16): 1612002. | |
8 | DING M, HUO Y, YI H, et al. Learning depth-guided convolutions for monocular 3D object detection [C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 11669-11678. |
9 | MOUSAVIAN A, ANGUELOV D, FLYNN J, et al. 3D bounding box estimation using deep learning and geometry [C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 5632-5640. |
10 | QIN Z, WANG J, LU Y. MonoGRNet: a geometric reasoning network for monocular 3D object localization [J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33(1): 8851-8858. |
11 | BRAZIL G, LIU X. M3D-RPN: monocular 3D region proposal network for object detection [C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 9286-9295. |
12 | GODARD C, AODHA O M, FIRMAN M, et al. Digging into self-supervised monocular depth estimation [C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 3827-3837. |
13 | FU H, GONG M, WANG C, et al. Deep ordinal regression network for monocular depth estimation [C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 2002-2011. |
14 | CHEN X, MA H, WAN J, et al. Multi-view 3D object detection network for autonomous driving [C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 6526-6534. |
15 | SHIRMOHAMMADI Z, NIKOOFARD A, ERSHADI G. AM3D: an accurate crosstalk probability modeling to predict channel delay in 3D ICs [J]. Microelectronics Reliability, 2019, 102: 113379. |
16 | READING C, HARAKEH A, CHAE J, et al. Categorical depth distribution network for monocular 3D object detection [C]// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 8551-8560. |
17 | HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition [C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 770-778. |
18 | HE K, GKIOXARI G, DOLLÁR P, et al. Mask R-CNN [C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 2980-2988. |
19 | GIRSHICK R. Fast R-CNN [C]// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2015: 1440-1448. |
20 | GEIGER A, LENZ P, URTASUN R. Are we ready for autonomous driving? the KITTI vision benchmark suite [C]// Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2012: 3354-3361. |
21 | CORDTS M, OMRAN M, RAMOS S, et al. The Cityscapes dataset for semantic urban scene understanding [C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 3213-3223. |
22 | WOO S, PARK J, LEE J-Y, et al. CBAM: convolutional block attention module [C]// Proceedings of the 15th European Conference on Computer Vision. Cham: Springer, 2018: 3-19. |
23 | DE BRABANDERE B, JIA X, TUYTELAARS T, et al. Dynamic filter networks [C]// Proceedings of the 30th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2016: 667-675. |
24 | LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot MultiBox detector [C]// Proceedings of the 14th European Conference on Computer Vision. Cham: Springer, 2016: 21-37. |
25 | LIN T-Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection [C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 2999-3007. |
26 | CHEN X, KUNDU K, ZHU Y, et al. 3D object proposals for accurate object class detection [C]// Proceedings of the 28th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2015: 424-432. |
27 | CHEN X, KUNDU K, ZHANG Z, et al. Monocular 3D object detection for autonomous driving [C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 2147-2156. |
28 | XU B, CHEN Z. Multi-level fusion based 3D object detection from monocular images [C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 2345-2353. |
29 | LIU Z, WU Z, TÓTH R. Smoke: single-stage monocular 3D object detection via keypoint estimation [C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Piscataway: IEEE, 2020: 4289-4298. |
30 | CHONG Z, MA X, ZHANG H, et al. MonoDistill: learning spatial features for monocular 3D object detection [EB/OL]. (2022-01-26) [2023-08-29]. . |
[1] | Yunchuan HUANG, Yongquan JIANG, Juntao HUANG, Yan YANG. Molecular toxicity prediction based on meta graph isomorphism network [J]. Journal of Computer Applications, 2024, 44(9): 2964-2969. |
[2] | Yexin PAN, Zhe YANG. Optimization model for small object detection based on multi-level feature bidirectional fusion [J]. Journal of Computer Applications, 2024, 44(9): 2871-2877. |
[3] | Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974. |
[4] | Xiyuan WANG, Zhancheng ZHANG, Shaokang XU, Baocheng ZHANG, Xiaoqing LUO, Fuyuan HU. Unsupervised cross-domain transfer network for 3D/2D registration in surgical navigation [J]. Journal of Computer Applications, 2024, 44(9): 2911-2918. |
[5] | Shunyong LI, Shiyi LI, Rui XU, Xingwang ZHAO. Incomplete multi-view clustering algorithm based on self-attention fusion [J]. Journal of Computer Applications, 2024, 44(9): 2696-2703. |
[6] | Yuhan LIU, Genlin JI, Hongping ZHANG. Video pedestrian anomaly detection method based on skeleton graph and mixed attention [J]. Journal of Computer Applications, 2024, 44(8): 2551-2557. |
[7] | Yanjie GU, Yingjun ZHANG, Xiaoqian LIU, Wei ZHOU, Wei SUN. Traffic flow forecasting via spatial-temporal multi-graph fusion [J]. Journal of Computer Applications, 2024, 44(8): 2618-2625. |
[8] | Qianhong SHI, Yan YANG, Yongquan JIANG, Xiaocao OUYANG, Wubo FAN, Qiang CHEN, Tao JIANG, Yuan LI. Multi-granularity abrupt change fitting network for air quality prediction [J]. Journal of Computer Applications, 2024, 44(8): 2643-2650. |
[9] | Zheng WU, Zhiyou CHENG, Zhentian WANG, Chuanjian WANG, Sheng WANG, Hui XU. Deep learning-based classification of head movement amplitude during patient anaesthesia resuscitation [J]. Journal of Computer Applications, 2024, 44(7): 2258-2263. |
[10] | Huanhuan LI, Tianqiang HUANG, Xuemei DING, Haifeng LUO, Liqing HUANG. Public traffic demand prediction based on multi-scale spatial-temporal graph convolutional network [J]. Journal of Computer Applications, 2024, 44(7): 2065-2072. |
[11] | Zhi ZHANG, Xin LI, Naifu YE, Kaixi HU. DKP: defending against model stealing attacks based on dark knowledge protection [J]. Journal of Computer Applications, 2024, 44(7): 2080-2086. |
[12] | Yiqun ZHAO, Zhiyu ZHANG, Xue DONG. Anisotropic travel time computation method based on dense residual connection physical information neural networks [J]. Journal of Computer Applications, 2024, 44(7): 2310-2318. |
[13] | Yangyi GAO, Tao LEI, Xiaogang DU, Suiyong LI, Yingbo WANG, Chongdan MIN. Crowd counting and locating method based on pixel distance map and four-dimensional dynamic convolutional network [J]. Journal of Computer Applications, 2024, 44(7): 2233-2242. |
[14] | Song XU, Wenbo ZHANG, Yifan WANG. Lightweight video salient object detection network based on spatiotemporal information [J]. Journal of Computer Applications, 2024, 44(7): 2192-2199. |
[15] | Yajuan ZHAO, Fanjun MENG, Xingjian XU. Review of online education learner knowledge tracing [J]. Journal of Computer Applications, 2024, 44(6): 1683-1698. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||