Visual perception, as one of the core technologies of environmental understanding, provides accurate environmental information for intelligent mobile systems (such as autonomous driving) and is an important prerequisite for ensuring safety decisions. 3D object detection technology based on Bird’s Eye View (BEV) has become the mainstream paradigm in the field of environmental perception because of its efficiency and accuracy. To further promote the research of 3D object detection algorithms based on BEV, the following was performed. Firstly, the BEV 3D object detection algorithms were classified systematically, and according to the modals of the input data, they were divided into three categories: pure camera algorithms, pure LiDAR algorithms and camera-LiDAR fusion algorithms. Secondly, the role of pre-training algorithms in improving detection performance was explored. Thirdly, the advantages and disadvantages of the algorithms fusing temporal features in dynamic scenarios and the performance of the algorithms fusing height features in complex environments were analyzed. Fourthly, the breakthrough progress made by large model collaborative BEV object detection in object detection accuracy and scenario understanding was sorted out. Finally, the core conclusions of BEV 3D object detection algorithms were summarized, and future research directions were looked forward, so as to provide new ideas for research work in this field.