Journal of Computer Applications

    Next Articles

Binocular vision object location algorithm for robot arm grasping

  

  • Received:2024-11-11 Revised:2024-12-30 Accepted:2025-01-07 Online:2025-01-14 Published:2025-01-14

面向机械臂抓取的双目视觉目标定位算法

蒋畅江,向杰,何旭颖   

  1. 重庆邮电大学
  • 通讯作者: 蒋畅江
  • 基金资助:
    国家自然科学基金

Abstract: It is the key for robot arm grasping to recognize the object and locate its spatial coordinates by machine vision algorithm. Aiming at the problems of low positioning accuracy and low efficiency in binocular vision based object recognition and positioning, BDS-YOLO (Binocular Detect and Stereo YOLO) binocular vision object positioning algorithm for robot arm grasping was proposed. It combines object detection and stereo depth estimation algorithm, uses attention mechanism to interact cross-view feature information, improves feature expression ability, and enables the network to obtain high-quality disparity through depth feature matching. After further improvement of self-attention mechanism, the disparity is converted into depth according to the triangulation principle. The network uses multi-task learning to train both object detection and stereo depth estimation networks, and is trained using both synthetic and real data. To solve the problem that real data is not easy to mark dense depth, self-supervised learning technology is used to optimize the process of image reconstruction from disparity, and improve the generalization ability of network to the real world. The experimental results show that the accuracy of object detection is improved by 6.5 percentage points compared with YOLOv8l on the real dataset, the predicted disparity and converted depth are better than the special stereo depth estimation algorithm, the inference speed can reach 20 frame/s, and the object recognition and location are better than other methods. The proposed network can meet the requirements of real-time object detection and location.

Key words: binocular vision, object detection, stereo matching, stereo depth estimation, object localization, deep learning, attention mechanism

摘要: 通过机器视觉算法对目标进行识别并定位其空间坐标是实现机械臂视觉抓取的关键。针对基于双目视觉的目标识别与定位中定位精度低、运行效率不高等问题,提出了面向机械臂抓取的BDS-YOLO(Binocular Detect and Stereo YOLO)双目视觉目标定位算法。它联合目标检测与立体深度估计算法,使用注意力机制进行跨视图特征信息交互,提高特征表达能力,使网络可以通过深度特征匹配获得高质量视差图,经过自注意力机制进一步提升后,由三角测量原理转换为深度信息。网络采用多任务学习,同时训练目标检测与立体深度估计网络,并使用合成与真实数据共同训练。针对真实数据不易标注密集深度问题,采用自监督学习技术,优化由视差重建图像的过程,提高网络对现实世界的泛化能力。实验结果表明:该网络对目标检测的精度相较于YOLOv8l在真实数据集上提升6.5个百分点,预测的视差和转换后的深度优于专门的立体深度估计算法,推理速度可达20 frame/s,并且对目标对象识别和定位均优于其他的方法,提出的网络能较好地完成目标实时检测与定位的需求。

关键词: 双目视觉, 目标检测, 立体匹配, 立体深度估计, 目标定位, 深度学习, 注意力机制

CLC Number: