Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (3): 946-952.DOI: 10.11772/j.issn.1001-9081.2024030290

• Multimedia computing and computer simulation • Previous Articles     Next Articles

LiDAR-camera 3D object detection based on multi-modal information mutual guidance and supplementation

Chuanhao ZHANG1(), Xiaohan TU1, Xuehui GU1, Bo XUAN2   

  1. 1.Department of Image and Network Investigation,Zhengzhou Police University,Zhengzhou Henan 450000,China
    2.Zhengzhou Tuming Intelligent Technology Company Limited,Zhengzhou Henan 450000,China
  • Received:2024-03-20 Revised:2024-06-03 Accepted:2024-06-04 Online:2024-07-19 Published:2025-03-10
  • Contact: Chuanhao ZHANG
  • About author:TU Xiaohan, born in 1991, Ph. D., lecturer. Her research interests include computer vision, machine learning.
    GU Xuehui, born in 1984, Ph. D., associate professor. His research interests include video image processing and verification.
    XUAN Bo, born in 1978, Ph. D., associate research fellow. His research interests include pattern recognition, graphics and image processing.
  • Supported by:
    National Key Research and Development Program of China;Henan Provincial Natural Science Foundation(242300420693);Henan Province Science and Technology Research Project(212102210531);Fundamental Research Funds for Central University(2022TJJBKY002)

基于多模态信息相互引导补充的雷达-相机三维目标检测

张传浩1(), 屠晓涵1, 谷学汇1, 轩波2   

  1. 1.郑州警察学院 图像与网络侦查系,郑州 450000
    2.郑州图铭智能科技有限公司,郑州 450000
  • 通讯作者: 张传浩
  • 作者简介:屠晓涵(1991—),女,河南南阳人,讲师,博士,CCF会员,主要研究方向:计算机视觉、机器学习
    谷学汇(1984—),男,吉林长春人,副教授,博士,CCF会员,主要研究方向:视频图像处理与检验
    轩波(1978—),男,北京人,副研究员,博士,主要研究方向:模式识别、图形图像处理。
  • 基金资助:
    国家重点研发计划项目;河南省自然科学基金资助项目(242300420693);河南省科技攻关项目(212102210531);中央高校基本科研业务费项目(2022TJJBKY002)

Abstract:

Multi-modal 3D object detection is an important task in computer vision, and how to better fuse information among different modalities is always a research focus of this task. Previous methods lack information filtering when fusing the information of different modalities, and excessive irrelevant and interference information may lead to a decline in model performance. To address the above issues, an LiDAR-camera 3D object detection model based on multi-modal information mutual guidance and supplementation was proposed, which selected information from another modality for fusion adaptively when fusing features. Adaptive information fusion includes data-level and feature-level mutual guidance and supplementation. In data-level fusion, depth maps generated by point clouds and segmentation masks generated by images were used as input to construct instance-level depth maps and instance-level 3D virtual points, respectively, for supplementing images and point clouds. In feature-level fusion, voxel features generated by point clouds and feature maps generated by images were used as input, and key regions were selected from another modality for the features to be fused and feature fusion was conducted through attention mechanism. Experimental results show that the proposed model achieves good results on nuScenes test set. Compared to traditional unguided fusion models such as BEVFusion and TransFusion, the proposed model has the two mainstream evaluation indexes — mean Average Precision (mAP) and nuScenes Detection Score (NDS) improved by 0.9-28.9 percentage points and 0.6-26.1 percentage points, respectively. The above verifies that the proposed model can improve the accuracy of multi-modal 3D object detection effectively.

Key words: multi-modal, 3D object detection, adaptive information fusion, data-level fusion, feature-level fusion, LiDAR-camera

摘要:

多模态三维目标检测是计算机视觉的一项重要任务,如何更好地融合不同模态之间的信息一直是该任务的研究重点。现有方法在融合不同模态信息时缺少对信息的筛选,且过多无关与干扰信息会造成模型性能的下降。针对上述问题,提出一种基于多模态信息相互引导补充的雷达-相机三维目标检测模型,以在融合特征时从另一种模态中自适应地挑选信息进行融合。自适应信息融合包括数据层面的相互引导补充和特征层面的相互引导补充。在数据层面的融合中,使用由点云产生的深度图和图像产生的分割掩码作为输入,以分别构建出实例级的深度图与实例级的三维虚拟点用于图像与点云的补充。在特征层面的融合中,使用点云产生的体素特征和图像产生的特征图作为输入,并从另一种模态中为待融合特征选取关键区域并通过注意力机制进行特征融合。实验结果表明,所提模型在nuScenes测试集上取得了良好的效果。相较于BEVFusion和TransFusion等传统非引导的融合模型,所提模型将平均精度均值(mAP)和nuScenes检测分数(NDS)这2个主流评测指标分别提升了0.9~28.9个百分点和0.6~26.1个百分点。以上验证了所提模型可有效提高多模态三维目标检测的准确性。

关键词: 多模态, 三维目标检测, 自适应信息融合, 数据层面融合, 特征层面融合, 雷达-相机

CLC Number: