《计算机应用》唯一官方网站

• •    下一篇

BEV三维目标检测算法体系综述

郭阳1,2,王海亮1,2,高需1,2*,王海涛1,2,王翌博1,2   

  1. 1.郑州大学 计算机与人工智能学院,郑州 450001;2. 郑州大学 国家超级计算郑州中心,郑州 450001
  • 收稿日期:2025-04-18 修回日期:2025-07-24 接受日期:2025-07-25 发布日期:2025-07-30 出版日期:2025-07-30
  • 通讯作者: 高需
  • 基金资助:
    郑州市重大科技创新专项;河南省研究生教育改革与质量提升工程项目

Survey on BEV 3D target detection algorithm system

  • Received:2025-04-18 Revised:2025-07-24 Accepted:2025-07-25 Online:2025-07-30 Published:2025-07-30
  • Supported by:
    Zhengzhou City Major Science and Technology Innovation Project;Postgraduate Education Reform and Quality Improv ement Project of Henan Province

摘要: 视觉感知作为环境理解的核心技术之一,为智能移动系统(如自动驾驶车辆)提供精准的环境信息,是保障安全决策的重要前提。基于鸟瞰图(BEV)的三维目标检测技术因它具有的高效性和准确性已成为了环境感知领域的主流范式。为进一步促进基于BEV的三维目标检测算法的研究,首先对所涵盖的算法进行系统分类,根据输入数据的模态,将它们分为纯相机算法、纯激光雷达算法和相机-激光雷达融合算法;其次,探讨预训练算法在提升检测性能中的作用;再次,分析融合时序特征的算法在动态场景中的优势和融合高度特征的算法在复杂环境下的表现。继次,梳理大模型协同BEV目标检测在目标检测精度与场景理解方面取得的突破性进展;最后,总结核心结论,并展望未来研究方向,以期为该领域的研究工作提供新的思路。

关键词: 鸟瞰图, 三维目标检测, 预训练, 时序特征, 高度特征, 大模型

Abstract: Visual perception, as one of the core technologies of environmental understanding, provides accurate environmental information for intelligent mobile systems (such as autonomous vehicles) and is an important prerequisite for ensuring safety decisions.  3D object detection technology based on Bird's Eye View (BEV) has become the mainstream paradigm in the field of environmental perception because of its efficiency and accuracy. To further promote the research of three-dimensional object detection algorithms based on BEV, the covered algorithms were first systematically classified, and according to the modes of the input data, they were divided into three categories: pure camera algorithm, pure lidar algorithm and camera-lidar fusion algorithm. Secondly, the role of pre-training algorithms in improving detection performance was explored. Then, the advantages of fusion timing characteristics were analyzed in dynamic scenarios and the performance of fusion high-level characteristics in complex environments. Besides, the breakthrough progress made in target detection accuracy and scenario understanding of large language model collaborative BEV target detection was sorted out. Finally, the core conclusions were summarized and future research directions were looked forward to provide new ideas for research work in this field.

Key words: Bird's Eye View(BEV), 3D object detection, pre-training, temporal features, height features, large language model

中图分类号: