《计算机应用》唯一官方网站 ›› 2026, Vol. 46 ›› Issue (4): 1238-1252.DOI: 10.11772/j.issn.1001-9081.2025040419

• 多媒体计算与计算机仿真 • 上一篇    

BEV三维目标检测算法体系综述

郭阳1,2, 王海亮1,2, 高需1,2(), 王海涛1,2, 王翌博1,2   

  1. 1.郑州大学 计算机与人工智能学院,郑州 450001
    2.郑州大学 国家超级计算郑州中心,郑州 450001
  • 收稿日期:2025-04-18 修回日期:2025-07-24 接受日期:2025-07-25 发布日期:2025-07-30 出版日期:2026-04-10
  • 通讯作者: 高需
  • 作者简介:郭阳(1983—),男,河南巩义人,工程师,博士,主要研究方向:人工智能、高性能计算
    王海亮(1998—),男,河南南阳人,硕士研究生,主要研究方向:自动驾驶视觉感知
    王海涛(2000—),男,河南驻马店人,硕士研究生,主要研究方向:大模型微调
    王翌博(2001—),男,河北石家庄人,硕士研究生,主要研究方向:人工智能。
  • 基金资助:
    科技创新2030—“新一代人工智能”重大项目(2023ZD0120600);郑州市重大科技创新专项(2021KJZX0060)

Survey on BEV 3D object detection algorithm system

Yang GUO1,2, Hailiang WANG1,2, Xu GAO1,2(), Haitao WANG1,2, Yibo WANG1,2   

  1. 1.School of Computer Science and Artificial Intelligence,Zhengzhou University,Zhengzhou Henan 450001,China
    2.National Supercomputing Center in Zhengzhou,Zhengzhou University,Zhengzhou Henan 450001,China
  • Received:2025-04-18 Revised:2025-07-24 Accepted:2025-07-25 Online:2025-07-30 Published:2026-04-10
  • Contact: Xu GAO
  • About author:GUO Yang, born in 1983, Ph. D., engineer. His research interests include artificial intelligence, high-performance computing.
    WANG Hailiang, born in 1998, M. S. candidate. His research interests include visual perception for autonomous driving.
    WANG Haitao, born in 2000, M. S. candidate. His research interests include large model fine-tuning.
    WANG Yibo, born in 2001, M. S. candidate. His research interests include artificial intelligence.
  • Supported by:
    Science and Technology Innovation 2030 — “New Generation of Artificial Intelligence” Major Project(2023ZD0120600);Special Project of Zhengzhou Science and Technology Innovation(2021KJZX0060)

摘要:

视觉感知作为环境理解的核心技术之一,为智能移动系统(如自动驾驶)提供了精准的环境信息,是保障安全决策的重要前提。基于鸟瞰图(BEV)的三维目标检测技术因它的高效性和准确性已成为环境感知领域的主流范式。为进一步促进基于BEV的三维目标检测算法的研究,首先,针对BEV三维目标检测算法,根据输入数据的模态,将它们分为纯相机算法、纯激光雷达算法和相机?激光雷达融合算法这3类;其次,探讨预训练算法在提升检测性能中的作用;再次,分析融合时序特征的算法在动态场景中的优势和融合高度特征的算法在复杂环境下的表现;继次,梳理大模型协同BEV目标检测在目标检测精度与场景理解方面取得的突破性进展;最后,总结BEV三维目标检测算法的核心结论,并展望未来的研究方向,为该领域的研究工作提供新的思路。

关键词: 鸟瞰图, 三维目标检测, 预训练, 时序特征, 高度特征, 大模型

Abstract:

Visual perception, as one of the core technologies of environmental understanding, provides accurate environmental information for intelligent mobile systems (such as autonomous driving) and is an important prerequisite for ensuring safety decisions. 3D object detection technology based on Bird’s Eye View (BEV) has become the mainstream paradigm in the field of environmental perception because of its efficiency and accuracy. To further promote the research of 3D object detection algorithms based on BEV, the following was performed. Firstly, the BEV 3D object detection algorithms were classified systematically, and according to the modals of the input data, they were divided into three categories: pure camera algorithms, pure LiDAR algorithms and camera-LiDAR fusion algorithms. Secondly, the role of pre-training algorithms in improving detection performance was explored. Thirdly, the advantages and disadvantages of the algorithms fusing temporal features in dynamic scenarios and the performance of the algorithms fusing height features in complex environments were analyzed. Fourthly, the breakthrough progress made by large model collaborative BEV object detection in object detection accuracy and scenario understanding was sorted out. Finally, the core conclusions of BEV 3D object detection algorithms were summarized, and future research directions were looked forward, so as to provide new ideas for research work in this field.

Key words: Bird’s Eye View (BEV), 3D object detection, pre-training, temporal feature, height feature, large model

中图分类号: