In emergency situations such as traffic accidents, the timely location of trapped passengers inside vehicle is critical for rescue operations. For the challenge of detecting passengers in vehicles during rescue, a detection model based on YOLOv5s was proposed to achieve two-stage detection of the vehicle and passengers inside. In the first stage, the vehicle’s location was determined, followed by focusing on the Regions Of Interest (ROIs) through dynamic scaling. After that, histogram equalization was utilized to enhance the contrast of underexposed images, and non-local means filtering was employed to remove noise, thereby improving the image quality. In the second stage of the model, the existence of trapped passengers inside the vehicle was determined accurately, and the exact locations of the passengers were identified. Experimental results on BIT-Vehicle and UA-DETRAC datasets show that compared to models such as Faster-RCNN and YOLOv7-tiny, the proposed model has the best performance in terms of precision, recall, and other metrics, demonstrating stronger robustness and higher accuracy. Besides, the real-time performance of the proposed model can meet the needs of intelligent rescue scenarios.