Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (7): 2208-2215.DOI: 10.11772/j.issn.1001-9081.2023070990

• Multimedia computing and computer simulation • Previous Articles     Next Articles

Monocular 3D object detection method integrating depth and instance segmentation

Xun SUN1, Ruifeng FENG2(), Yanru CHEN2   

  1. 1.Line Station Design and Research Institute,China Railway Siyuan Survey and Design Group Company Limited,Wuhan Hubei 430063,China
    2.College of Economics and Management,Southwest Jiaotong University,Chengdu Sichuan 610031,China
  • Received:2023-07-21 Revised:2023-09-26 Accepted:2023-09-28 Online:2023-10-26 Published:2024-07-10
  • Contact: Ruifeng FENG
  • About author:SUN Xun, born in 1972, senior engineer.
    CHEN Yanru, born in 1974, Ph. D., professor. Her research interests include logistics system modeling and optimization, machine learning.
    First author contact:Her research interests include station intelligence, logistics planning, station design.
    FENG Ruifeng, born in 1999, M. S. candidate. His research interests include logistics system modeling and optimization, machine learning.
  • Supported by:
    National Natural Science Foundation of China(62173279)

基于深度与实例分割融合的单目3D目标检测方法

孙逊1, 冯睿锋2(), 陈彦如2   

  1. 1.中铁第四勘察设计院集团有限公司 线路站场设计研究院, 武汉 430063
    2.西南交通大学 经济管理学院, 成都 610031
  • 通讯作者: 冯睿锋
  • 作者简介:孙逊(1972—)女,湖北武汉人,高级工程师,主要研究方向:场站智能化、物流规划、场站设计;
    陈彦如(1974—),女,内蒙古包头人,教授,博士,主要研究方向:物流系统建模与优化、机器学习。
    第一联系人:冯睿锋(1999—),男,四川遂宁人,硕士研究生,主要研究方向:物流系统建模与优化、机器学习;
  • 基金资助:
    国家自然科学基金资助项目(62173279)

Abstract:

To address the limitations of monocular 3D object detection, when encountering changing object size due to changing perspective and occlusion, a new monocular 3D object detection method was proposed fusing depth information with instance segmentation masks. Firstly, with the help of the Depth-Mask Attention Fusion (DMAF) module, depth information was combined with instance segmentation masks to provide more accurate object boundaries. Secondly, dynamic convolution was introduced, and the fused features obtained from the DMAF module were used to guide the generation of dynamic convolution kernels for dealing with objects of different scales. Moreover, a 2D-3D bounding box consistency loss function was introduced into loss function, adjusting the predicted 3D bounding box to highly coincide with corresponding 2D detection box, thereby enhancing performance in instance segmentation and 3D object detection tasks. Lastly, the effectiveness of the proposed method was confirmed through ablation studies and validated on the KITTI test set. The results indicate that, compared to methods using only depth estimation maps and instance segmentation masks, the proposed method improves the average accuracy of vehicle detection under medium difficulty by 6.36 percentage points, and it outperforms comparative techniques like D4LCN (Depth-guided Dynamic-Depthwise-Dilated Local Convolutional Network) and M3D-RPN (Monocular 3D Region Proposal Network) in both 3D object detection and aerial view object detection tasks.

Key words: monocular 3D object detection, deep learning, dynamic convolution, instance segmentation

摘要:

针对单目3D目标检测在视角变化引起的物体大小变化以及物体遮挡等情况下效果不佳的问题,提出一种融合深度信息和实例分割掩码的新型单目3D目标检测方法。首先,通过深度-掩码注意力融合(DMAF)模块,将深度信息与实例分割掩码结合,以提供更准确的物体边界;其次,引入动态卷积,并利用DMAF模块得到的融合特征引导动态卷积核的生成,以处理不同尺度的物体;再次,在损失函数中引入2D-3D边界框一致性损失函数,调整预测的3D边界框与对应的2D检测框高度一致,以提高实例分割和3D目标检测任务的效果;最后,通过消融实验验证该方法的有效性,并在KITTI测试集上对该方法进行验证。实验结果表明,与仅使用深度估计图和实例分割掩码的方法相比,在中等难度下对车辆类别检测的平均精度提高了6.36个百分点,且3D目标检测和鸟瞰图目标检测任务的效果均优于D4LCN(Depth-guided Dynamic-Depthwise-Dilated Local Convolutional Network)、M3D-RPN(Monocular 3D Region Proposal Network)等对比方法。

关键词: 单目3D目标检测, 深度学习, 动态卷积, 实例分割

CLC Number: