《计算机应用》唯一官方网站 ›› 2025, Vol. 45 ›› Issue (11): 3698-3706.DOI: 10.11772/j.issn.1001-9081.2024111599
• 多媒体计算与计算机仿真 • 上一篇
收稿日期:2024-11-11
修回日期:2024-12-30
接受日期:2025-01-07
发布日期:2025-01-14
出版日期:2025-11-10
通讯作者:
蒋畅江
作者简介:向杰(1997—),男,重庆人,硕士研究生,主要研究方向:计算机视觉、目标检测基金资助:
Changjiang JIANG1,2(
), Jie XIANG1,2, Xuying HE1,2
Received:2024-11-11
Revised:2024-12-30
Accepted:2025-01-07
Online:2025-01-14
Published:2025-11-10
Contact:
Changjiang JIANG
About author:XIANG Jie, born in 1997, M. S. candidate. His research interests include computer vision, object detection.Supported by:摘要:
通过机器视觉算法对目标进行识别并定位它的空间坐标是实现机械臂视觉抓取的关键。针对基于双目视觉的目标识别与定位中定位精度低、运行效率不高等问题,提出面向机械臂抓取的联合双目视觉目标检测与立体深度估计的网络结构BDS-YOLO(Binocular Detect and Stereo YOLO)及基于BDS-YOLO的目标定位算法。该算法联合目标检测与立体深度估计算法,利用注意力机制进行跨视图特征信息交互,从而提高特征表达能力,使网络可以通过深度特征匹配获得高质量视差图,再经过自注意力机制进一步提升后,由三角测量原理转换为深度信息。BDS-YOLO网络采用多任务学习,同时训练目标检测与立体深度估计网络,并使用合成与真实数据共同训练。针对真实数据不易标注密集深度的问题,采用自监督学习技术优化由视差重建图像的过程,以提高BDS-YOLO网络对现实世界的泛化能力。实验结果表明:BDS-YOLO网络在真实数据集上对目标检测的平均精度(AP)比YOLOv8l高6.5个百分点,预测的视差和转换后的深度优于专门的立体深度估计算法,推理速度可达20 frame/s以上,对目标对象的识别和定位均优于对比方法,能较好地满足目标实时检测与定位的需求。
中图分类号:
蒋畅江, 向杰, 何旭颖. 面向机械臂抓取的双目视觉目标定位算法[J]. 计算机应用, 2025, 45(11): 3698-3706.
Changjiang JIANG, Jie XIANG, Xuying HE. Binocular vision object localization algorithm for robot arm grasping[J]. Journal of Computer Applications, 2025, 45(11): 3698-3706.
| 模型 | FAT数据集 | 真实数据集 | ||
|---|---|---|---|---|
| AP | R | AP | R | |
| w/o all | 79.5 | 81.4 | 48.6 | 54.1 |
| w/o cross | 83.5 | 84.8 | 85.6 | 87.8 |
| w/o real | 86.2 | 87.4 | 46.4 | 52.9 |
| w/o direct | 84.4 | 85.7 | 97.7 | 98.3 |
| w/o recon | 83.2 | 86.9 | 96.2 | 98.2 |
| full | 85.9 | 87.2 | 98.1 | 98.7 |
表1 目标检测的消融实验结果 ( %)
Tab. 1 Ablation experimental results of object detection unit: %
| 模型 | FAT数据集 | 真实数据集 | ||
|---|---|---|---|---|
| AP | R | AP | R | |
| w/o all | 79.5 | 81.4 | 48.6 | 54.1 |
| w/o cross | 83.5 | 84.8 | 85.6 | 87.8 |
| w/o real | 86.2 | 87.4 | 46.4 | 52.9 |
| w/o direct | 84.4 | 85.7 | 97.7 | 98.3 |
| w/o recon | 83.2 | 86.9 | 96.2 | 98.2 |
| full | 85.9 | 87.2 | 98.1 | 98.7 |
| 模型 | EPE | D1-all | Abs Rel | Sq Rel | RMSE | RMSE log |
|---|---|---|---|---|---|---|
| w/o all | 2.906 8 | 19.30 | 0.129 3 | 0.476 1 | 0.492 7 | 0.167 8 |
| w/o cross | 2.971 9 | 16.72 | 0.138 2 | 0.510 8 | 0.471 3 | 0.170 1 |
| w/o real | 0.503 1 | 2.89 | 0.014 5 | 0.006 9 | 0.070 6 | 0.031 7 |
| w/o direct | 0.592 1 | 3.47 | 0.017 3 | 0.015 1 | 0.088 3 | 0.032 9 |
| w/o recon | 0.553 8 | 3.11 | 0.016 0 | 0.007 7 | 0.075 8 | 0.033 9 |
| full | 0.499 2 | 2.89 | 0.014 3 | 0.006 8 | 0.069 9 | 0.031 5 |
表2 立体深度估计的消融实验结果
Tab. 2 Ablation experimental results of stereo depth estimation
| 模型 | EPE | D1-all | Abs Rel | Sq Rel | RMSE | RMSE log |
|---|---|---|---|---|---|---|
| w/o all | 2.906 8 | 19.30 | 0.129 3 | 0.476 1 | 0.492 7 | 0.167 8 |
| w/o cross | 2.971 9 | 16.72 | 0.138 2 | 0.510 8 | 0.471 3 | 0.170 1 |
| w/o real | 0.503 1 | 2.89 | 0.014 5 | 0.006 9 | 0.070 6 | 0.031 7 |
| w/o direct | 0.592 1 | 3.47 | 0.017 3 | 0.015 1 | 0.088 3 | 0.032 9 |
| w/o recon | 0.553 8 | 3.11 | 0.016 0 | 0.007 7 | 0.075 8 | 0.033 9 |
| full | 0.499 2 | 2.89 | 0.014 3 | 0.006 8 | 0.069 9 | 0.031 5 |
| 数据集 | 模型 | Abs/mm | Rel/% | Valid/% |
|---|---|---|---|---|
| FAT | w/o all | 23.38 | 2.39 | 81.56 |
| w/o cross | 22.56 | 2.11 | 82.38 | |
| w/o real | 8.76 | 0.80 | 93.86 | |
| w/o direct | 11.31 | 1.08 | 90.98 | |
| w/o recon | 9.58 | 0.87 | 93.06 | |
| full | 8.54 | 0.77 | 93.73 | |
| 真实 | w/o all | 32.98 | 3.65 | 60.75 |
| w/o cross | 26.84 | 3.22 | 81.90 | |
| w/o real | 24.12 | 2.98 | 63.77 | |
| w/o direct | 17.14 | 2.49 | 86.82 | |
| w/o recon | 15.68 | 2.07 | 91.49 | |
| full | 14.55 | 2.05 | 94.12 |
表3 目标定位的消融实验结果
Tab. 3 Ablation experimental results of object localization
| 数据集 | 模型 | Abs/mm | Rel/% | Valid/% |
|---|---|---|---|---|
| FAT | w/o all | 23.38 | 2.39 | 81.56 |
| w/o cross | 22.56 | 2.11 | 82.38 | |
| w/o real | 8.76 | 0.80 | 93.86 | |
| w/o direct | 11.31 | 1.08 | 90.98 | |
| w/o recon | 9.58 | 0.87 | 93.06 | |
| full | 8.54 | 0.77 | 93.73 | |
| 真实 | w/o all | 32.98 | 3.65 | 60.75 |
| w/o cross | 26.84 | 3.22 | 81.90 | |
| w/o real | 24.12 | 2.98 | 63.77 | |
| w/o direct | 17.14 | 2.49 | 86.82 | |
| w/o recon | 15.68 | 2.07 | 91.49 | |
| full | 14.55 | 2.05 | 94.12 |
| 网络 | FAT数据集 | 真实数据集 | ||
|---|---|---|---|---|
| AP | R | AP | R | |
| Stereo R-CNN | 70.6 | 75.2 | 74.3 | 81.9 |
| DSGN | 78.6 | 81.6 | 83.8 | 86.0 |
| YOLOStereo3D | 77.4 | 78.3 | 85.5 | 83.1 |
| CDN | 79.6 | 82.3 | 84.2 | 87.2 |
| Faster RCNN | 69.1 | 73.0 | 71.5 | 75.4 |
| Retina Net | 68.8 | 72.8 | 72.7 | 76.5 |
| Center Net | 72.8 | 74.5 | 73.1 | 76.2 |
| YOLOv5l | 83.1 | 84.7 | 90.7 | 93.0 |
| YOLOv8l | 83.8 | 85.4 | 91.6 | 92.9 |
| BDS-YOLO | 85.9 | 87.2 | 98.1 | 98.7 |
表4 目标检测的实验结果 ( %)
Tab. 4 Experimental results of object detection
| 网络 | FAT数据集 | 真实数据集 | ||
|---|---|---|---|---|
| AP | R | AP | R | |
| Stereo R-CNN | 70.6 | 75.2 | 74.3 | 81.9 |
| DSGN | 78.6 | 81.6 | 83.8 | 86.0 |
| YOLOStereo3D | 77.4 | 78.3 | 85.5 | 83.1 |
| CDN | 79.6 | 82.3 | 84.2 | 87.2 |
| Faster RCNN | 69.1 | 73.0 | 71.5 | 75.4 |
| Retina Net | 68.8 | 72.8 | 72.7 | 76.5 |
| Center Net | 72.8 | 74.5 | 73.1 | 76.2 |
| YOLOv5l | 83.1 | 84.7 | 90.7 | 93.0 |
| YOLOv8l | 83.8 | 85.4 | 91.6 | 92.9 |
| BDS-YOLO | 85.9 | 87.2 | 98.1 | 98.7 |
| 网络 | Abs/mm | Rel/% | Valid/% |
|---|---|---|---|
| Stereo R-CNN | 27.61 | 2.79 | 80.72 |
| DSGN | 19.57 | 1.78 | 87.25 |
| YOLOStereo3D | 21.83 | 2.26 | 83.12 |
| CDN | 17.92 | 1.66 | 88.32 |
| YOLOv8+BBox | 34.85 | 3.47 | 71.49 |
| YOLOv8+Tem | 27.78 | 2.71 | 79.14 |
| YOLOv8+SGBM | 8.92 | 0.81 | 87.12 |
| YOLOv8+CREStereo | 10.41 | 1.18 | 93.21 |
| YOLOv8+RAFT | 11.89 | 1.17 | 92.91 |
| BDS-YOLO | 8.54 | 0.77 | 93.73 |
表5 FAT数据集上的目标定位的实验结果
Tab. 5 Experimental results of object localization on FAT dataset
| 网络 | Abs/mm | Rel/% | Valid/% |
|---|---|---|---|
| Stereo R-CNN | 27.61 | 2.79 | 80.72 |
| DSGN | 19.57 | 1.78 | 87.25 |
| YOLOStereo3D | 21.83 | 2.26 | 83.12 |
| CDN | 17.92 | 1.66 | 88.32 |
| YOLOv8+BBox | 34.85 | 3.47 | 71.49 |
| YOLOv8+Tem | 27.78 | 2.71 | 79.14 |
| YOLOv8+SGBM | 8.92 | 0.81 | 87.12 |
| YOLOv8+CREStereo | 10.41 | 1.18 | 93.21 |
| YOLOv8+RAFT | 11.89 | 1.17 | 92.91 |
| BDS-YOLO | 8.54 | 0.77 | 93.73 |
| 网络 | Abs/mm | Rel/% | Valid/% | Time/ms |
|---|---|---|---|---|
| Stereo R-CNN | 38.34 | 4.53 | 72.36 | 234 |
| DSGN | 28.47 | 3.94 | 78.93 | 472 |
| YOLOStereo3D | 26.32 | 3.79 | 80.51 | 59 |
| CDN | 22.61 | 2.68 | 85.64 | 218 |
| YOLOv8+BBox | 30.21 | 3.65 | 71.86 | 31 |
| YOLOv8+Tem | 27.44 | 3.28 | 77.23 | 57 |
| YOLOv8+SGBM | 30.87 | 3.56 | 81.93 | 98 |
| YOLOv8+CREStereo | 14.73 | 2.23 | 92.28 | 919 |
| YOLOv8+RAFT | 15.25 | 2.10 | 93.88 | 496 |
| BDS-YOLO | 14.55 | 2.05 | 94.12 | 46 |
表6 真实数据集上的目标定位的实验结果
Tab. 6 Experimental results of object localization on real dataset
| 网络 | Abs/mm | Rel/% | Valid/% | Time/ms |
|---|---|---|---|---|
| Stereo R-CNN | 38.34 | 4.53 | 72.36 | 234 |
| DSGN | 28.47 | 3.94 | 78.93 | 472 |
| YOLOStereo3D | 26.32 | 3.79 | 80.51 | 59 |
| CDN | 22.61 | 2.68 | 85.64 | 218 |
| YOLOv8+BBox | 30.21 | 3.65 | 71.86 | 31 |
| YOLOv8+Tem | 27.44 | 3.28 | 77.23 | 57 |
| YOLOv8+SGBM | 30.87 | 3.56 | 81.93 | 98 |
| YOLOv8+CREStereo | 14.73 | 2.23 | 92.28 | 919 |
| YOLOv8+RAFT | 15.25 | 2.10 | 93.88 | 496 |
| BDS-YOLO | 14.55 | 2.05 | 94.12 | 46 |
| 网络 | EPE | D1-all | Abs Rel | Sq Rel | RMSE | RMSE log |
|---|---|---|---|---|---|---|
| YOLO Stereo | 2.675 0 | 13.53 | 0.106 2 | 0.263 1 | 0.364 2 | 0.283 4 |
| CDN | 1.882 8 | 8.23 | 0.073 7 | 0.094 8 | 0.277 7 | 0.219 6 |
| ACVNet | 1.237 8 | 7.13 | 0.032 5 | 0.024 2 | 0.133 1 | 0.065 9 |
| GMStereo | 1.000 6 | 7.49 | 0.026 8 | 0.015 7 | 0.132 4 | 0.057 2 |
| CREStereo | 0.509 1 | 3.33 | 0.014 7 | 0.008 6 | 0.086 7 | 0.035 7 |
| RAFT-Stereo | 0.488 3 | 3.36 | 0.018 6 | 0.017 1 | 0.111 2 | 0.047 1 |
| BDS -YOLO | 0.499 2 | 2.89 | 0.014 3 | 0.006 8 | 0.069 9 | 0.031 5 |
表7 立体深度估计的实验结果
Tab. 7 Experimental results of stereo depth estimation
| 网络 | EPE | D1-all | Abs Rel | Sq Rel | RMSE | RMSE log |
|---|---|---|---|---|---|---|
| YOLO Stereo | 2.675 0 | 13.53 | 0.106 2 | 0.263 1 | 0.364 2 | 0.283 4 |
| CDN | 1.882 8 | 8.23 | 0.073 7 | 0.094 8 | 0.277 7 | 0.219 6 |
| ACVNet | 1.237 8 | 7.13 | 0.032 5 | 0.024 2 | 0.133 1 | 0.065 9 |
| GMStereo | 1.000 6 | 7.49 | 0.026 8 | 0.015 7 | 0.132 4 | 0.057 2 |
| CREStereo | 0.509 1 | 3.33 | 0.014 7 | 0.008 6 | 0.086 7 | 0.035 7 |
| RAFT-Stereo | 0.488 3 | 3.36 | 0.018 6 | 0.017 1 | 0.111 2 | 0.047 1 |
| BDS -YOLO | 0.499 2 | 2.89 | 0.014 3 | 0.006 8 | 0.069 9 | 0.031 5 |
| [1] | CONG Y, CHEN R, MA B, et al. A comprehensive study of 3-D vision-based robot manipulation[J]. IEEE Transactions on Cybernetics, 2023, 53(3): 1682-1698. |
| [2] | WANG C, CUI X, ZHAO S, et al. The application of deep learning in stereo matching and disparity estimation: a bibliometric review[J]. Expert Systems with Applications, 2024, 238(Pt B): No.122006. |
| [3] | POGGI M, TOSI F, BATSOS K, et al. On the synergies between machine learning and binocular stereo for depth estimation from images: a survey[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(9): 5314-5334. |
| [4] | ZOU Z, CHEN K, SHI Z, et al. Object detection in 20 years: a survey[J]. Proceedings of the IEEE, 2023, 111(3): 257-276. |
| [5] | GAO C, JIANG H, LIU X, et al. Improved binocular localization of kiwifruit in orchard based on fruit and calyx detection using YOLOv5x for robotic picking[J]. Computers and Electronics in Agriculture, 2024, 217: No.108621. |
| [6] | HU H, KAIZU Y, ZHANG H, et al. Recognition and localization of strawberries from 3D binocular cameras for a strawberry picking robot using coupled YOLO/Mask R-CNN[J]. International Journal of Agricultural and Biological Engineering, 2022, 15(6): 175-179. |
| [7] | 魏洪玲,李红岩. 基于深度双目视觉处理的智能采摘机器人设计[J]. 农机化研究, 2024, 46(7): 136-140. |
| WEI H L, LI H Y. Design of intelligent picking robot based on deep binocular vision processing[J]. Journal of Agricultural Mechanization Research, 2024, 46(7): 136-140. | |
| [8] | LAN M, WANG J, ZHU L. Perception and range measurement of sweeping machinery based on enhanced YOLOv8 and binocular vision[J]. IEEE Access, 2023, 11: 126398-126408. |
| [9] | LEI X, WU M, LI Y, et al. Detection and positioning of Camellia oleifera fruit based on LBP image texture matching and binocular stereo vision[J]. Agronomy, 2023, 13(8): No.2153. |
| [10] | LIU T H, NIE X N, WU J M, et al. Pineapple (Ananas comosus) fruit detection and localization in natural environment based on binocular stereo vision and improved YOLOv3 model[J]. Precision Agriculture, 2023, 24(1): 139-160. |
| [11] | TANG Y, ZHOU H, WANG H, et al. Fruit detection and positioning technology for a Camellia oleifera C. Abel orchard based on improved YOLOv4-tiny model and binocular stereo vision[J]. Expert Systems with Applications, 2023, 211: No.118573. |
| [12] | 汪雪林,杜丽学,陈德近,等.基于深度学习和双目视觉的汽车油箱外盖定位[J].计算机应用,2023,43(S1):281-287. |
| WANG X L, DU L X, CHEN D J, et al. Localization of automobile fuel tank cover based on deep learning and binocular vision[J]. Journal of Computer Applications, 2023, 43(S1): 281-287. | |
| [13] | 何君尧,王文胜,韩宜航.结合YOLOv8与双目测距算法的水面漂浮垃圾检测定位系统设计[J].现代电子技术,2024,47(20):1-7. |
| HE J Y, WANG W S, HAN Y H. Design of water surface floating garbage detection and positioning system combining YOLOv8 and binocular ranging algorithm[J]. Modern Electronics Technique, 2024, 47(20): 1-7. | |
| [14] | HIRSCHMÜLLER H. Stereo processing by semiglobal matching and mutual information[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008, 30(2): 328-341. |
| [15] | 朱龙舜,郑旻璐. 基于双目视觉技术的茶叶嫩芽定位方法研究[J]. 农机化研究, 2025, 47(2): 49-53. |
| ZHU L S, ZHENG M L. Research on tea bud location method based on binocular vision technology[J]. Journal of Agricultural Mechanization Research, 2025, 47(2): 49-53. | |
| [16] | 陈泉淦,陈新元,曾镛,等.基于YOLOv5的机耕船双目视觉障碍感知研究[J].中国农机化学报,2024,45(7): 261-268. |
| CHEN Q G, CHEN X Y, ZENG Y, et al. Research on binocular visual impairment perception of a cultivator boat based on YOLOv5[J]. Journal of Chinese Agricultural Mechanization, 2024, 45(7): 261-268. | |
| [17] | 袁斌,郎宇健,陈凌鹏,等. 基于YOLOv5和U-NET的多目标药盒抓取系统设计[J]. 包装工程, 2024, 45(9): 141-149. |
| YUAN B, LANG Y J, CHEN L P, et al. Design of multi-target medicine box grasping system based on YOLOv5 and U-NET[J]. Packaging Engineering, 2024, 45(9): 141-149. | |
| [18] | 郭辉,陈海洋,高国民,等. 基于YOLO v5m的红花花冠目标检测与空间定位方法[J]. 农业机械学报, 2023, 54(7): 272-281. |
| GUO H, CHEN H Y, GAO G M, et al. Safflower corolla object detection and spatial positioning method based on YOLO v5m[J]. Transactions of the Chinese Society for Agricultural Machinery, 2023, 54(7): 272-281. | |
| [19] | CAI L, ZHOU C, WANG Y, et al. Binocular vision-based pole-shaped obstacle detection and ranging study[J]. Applied Sciences, 2023, 13(23): No.12617. |
| [20] | 张奇志,唐凡懿. 双目视觉下钻杆接口定位的实现[J]. 石油机械, 2024, 52(10): 12-19, 73. |
| ZHANG Q Z, TANG F Y. Implementation of drill pipe joint positioning under binocular vision[J]. China Petroleum Machinery, 2024, 52(10): 12-19, 73. | |
| [21] | ZHENG S, LIU Y, WENG W, et al. Tomato recognition and localization method based on improved YOLOv5n-seg model and binocular stereo vision[J]. Agronomy, 2023, 13(9): No.2339. |
| [22] | LIPSON L, TEED Z, DENG J. RAFT-Stereo: multilevel recurrent field transforms for stereo matching[C]// Proceedings of the 2021 International Conference on 3D Vision. Piscataway: IEEE, 2021: 218-227. |
| [23] | WANG H M, LIN H Y, CHANG C C. Object detection and depth estimation approach based on deep convolutional neural networks[J]. Sensors, 2021, 21(14): No.4755. |
| [24] | 成彬,赵彬兵,雷华,等. 基于双目视觉的钢筋绑扎节点定位方法研究[J/OL]. 计算机工程 [2024-11-26].. |
| CHENG B, ZHAO B B, LEI H, et al. Research on the localization method of rebar tying nodes based on binocular vision[J/OL]. Computer Engineering [2024-11-26].. | |
| [25] | 谭斌,王婷. YOLOv5与视差计算算法的目标检测与测距系统设计[J]. 科学技术与工程, 2024, 24(21): 9015-9024. |
| TAN B, WANG T. Design of target detection and ranging system based on YOLOv5 and parallax computing algorithm[J]. Science Technology and Engineering, 2024, 24(21): 9015-9024. | |
| [26] | 邓洪兴,许兴时,王云飞,等. 基于双目立体匹配与改进YOLOv8n-Pose关键点检测的奶牛体尺测量方法[J]. 华南农业大学学报, 2024, 45(5): 802-811. |
| DENG H X, XU X S, WANG Y F, et al. Dairy cow body size measurement method based on binocular stereo matching and improved YOLOv8n-Pose keypoint detection[J]. Journal of South China Agricultural University, 2024, 45(5): 802-811. | |
| [27] | LI J, WANG P, XIONG P, et al. Practical stereo matching via cascaded recurrent network with adaptive correlation[C]// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 16242-16251. |
| [28] | MA X, OUYANG W, SIMONELLI A, et al. 3D object detection from images for autonomous driving: a survey[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46(5): 3537-3556. |
| [29] | LI P, CHEN X, SHEN S. Stereo R-CNN based 3D object detection for autonomous driving[C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 7636-7644. |
| [30] | CHEN Y, LIU S, SHEN X, et al. DSGN: deep stereo geometry network for 3D object detection[C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 12533-12542. |
| [31] | LIU Y, WANG L, LIU M. YOLOStereo3D: a step back to 2D for efficient stereo 3D detection[C]// Proceedings of the 2021 IEEE International Conference on Robotics and Automation. Piscataway: IEEE, 2021: 13018-13024. |
| [32] | GARG D, WANG Y, HARIHARAN B, et al. Wasserstein distances for stereo disparity estimation[C]// Proceedings of the 34th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2020: 22517-22529. |
| [33] | GODARD C, AODHA O MAC, FIRMAN M, et al. Digging into self-supervised monocular depth estimation[C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 3827-3837. |
| [34] | CHI C, WANG Q, HAO T, et al. Feature-level collaboration: joint unsupervised learning of optical flow, stereo depth and camera motion[C]// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 2463-2473. |
| [35] | GUO X, ZHAO H, SHAO S, et al. F2Depth: self-supervised indoor monocular depth estimation via optical flow consistency and feature map synthesis[J]. Engineering Applications of Artificial Intelligence, 2024, 133(Pt D): No.108391. |
| [36] | ZHANG N, NEX F, VOSSELMAN G, et al. Lite-Mono: a lightweight CNN and Transformer architecture for self-supervised monocular depth estimation[C]// Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 18537-18546. |
| [37] | ZHENG Z, WANG P, LIU W, et al. Distance-IoU loss: faster and better learning for bounding box regression[C]// Proceedings of the 34th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2020: 12993-13000. |
| [38] | LI X, WANG W, WU L, et al. Generalized focal loss: learning qualified and distributed bounding boxes for dense object detection[C]// Proceedings of the 34th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2020: 21002-21012. |
| [39] | TREMBLAY J, TO T, BIRCHFIELD S. Falling Things: a synthetic dataset for 3D object detection and pose estimation[C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Piscataway: IEEE, 2018: 2119-2122. |
| [40] | XU G, CHENG J, GUO P, et al. Attention concatenation volume for accurate and efficient stereo matching[C]// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 12971-12980. |
| [41] | XU H, ZHANG J, CAI J, et al. Unifying flow, stereo and depth estimation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(11): 13941-13958. |
| [1] | 魏利利, 闫丽蓉, 唐晓芬. 上下文语义表征和像素关系纠正的小样本目标检测[J]. 《计算机应用》唯一官方网站, 2025, 45(9): 2993-3002. |
| [2] | 邓伊琳, 余发江. 基于LSTM和可分离自注意力机制的伪随机数生成器[J]. 《计算机应用》唯一官方网站, 2025, 45(9): 2893-2901. |
| [3] | 李维刚, 邵佳乐, 田志强. 基于双注意力机制和多尺度融合的点云分类与分割网络[J]. 《计算机应用》唯一官方网站, 2025, 45(9): 3003-3010. |
| [4] | 王翔, 陈志祥, 毛国君. 融合局部和全局相关性的多变量时间序列预测方法[J]. 《计算机应用》唯一官方网站, 2025, 45(9): 2806-2816. |
| [5] | 许志雄, 李波, 边小勇, 胡其仁. 对抗样本嵌入注意力U型网络的3D医学图像分割[J]. 《计算机应用》唯一官方网站, 2025, 45(9): 3011-3016. |
| [6] | 张嘉祥, 李晓明, 张佳慧. 结合新类特征增强与度量机制的小样本目标检测算法[J]. 《计算机应用》唯一官方网站, 2025, 45(9): 2984-2992. |
| [7] | 景攀峰, 梁宇栋, 李超伟, 郭俊茹, 郭晋育. 基于师生学习的半监督图像去雾算法[J]. 《计算机应用》唯一官方网站, 2025, 45(9): 2975-2983. |
| [8] | 吕景刚, 彭绍睿, 高硕, 周金. 复频域注意力和多尺度频域增强驱动的语音增强网络[J]. 《计算机应用》唯一官方网站, 2025, 45(9): 2957-2965. |
| [9] | 张宏俊, 潘高军, 叶昊, 陆玉彬, 缪宜恒. 结合深度学习和张量分解的多源异构数据分析方法[J]. 《计算机应用》唯一官方网站, 2025, 45(9): 2838-2847. |
| [10] | 李进, 刘立群. 基于残差Swin Transformer的SAR与可见光图像融合[J]. 《计算机应用》唯一官方网站, 2025, 45(9): 2949-2956. |
| [11] | 殷兵, 凌震华, 林垠, 奚昌凤, 刘颖. 兼容缺失模态推理的情感识别方法[J]. 《计算机应用》唯一官方网站, 2025, 45(9): 2764-2772. |
| [12] | 周金, 李玉芝, 张徐, 高硕, 张立, 盛家川. 复杂电磁环境下的调制识别网络[J]. 《计算机应用》唯一官方网站, 2025, 45(8): 2672-2682. |
| [13] | 葛丽娜, 王明禹, 田蕾. 联邦学习的高效性研究综述[J]. 《计算机应用》唯一官方网站, 2025, 45(8): 2387-2398. |
| [14] | 彭鹏, 蔡子婷, 刘雯玲, 陈才华, 曾维, 黄宝来. 基于CNN和双向GRU混合孪生网络的语音情感识别方法[J]. 《计算机应用》唯一官方网站, 2025, 45(8): 2515-2521. |
| [15] | 谢斌红, 剌颖坤, 张英俊, 张睿. 自步学习指导下的半监督目标检测框架[J]. 《计算机应用》唯一官方网站, 2025, 45(8): 2546-2554. |
| 阅读次数 | ||||||
|
全文 |
|
|||||
|
摘要 |
|
|||||