Dangerous driving behavior of drivers is one of the main causes of vicious traffic accidents, so identifying driver’s behavior is of great significance for engineering applications. Currently, the mainstream vision-based detection methods are to study the local spatiotemporal features of driver behavior, and less research is done on global spatial features and long-term temporal correlation features, which to a certain extent cannot be combined with the scene context information to identify dangerous driving behaviors. To solve the above problems, a driver behavior recognition method based on a dual-path spatiotemporal network was proposed, which integrated the advantages of different spatiotemporal pathways to improve the richness of behavioral features. Firstly, an improved Two-Stream convolutional Network (TSN) was used to learn the spatiotemporal information for characterization while reducing the sparsity of extracted features. Secondly, a Transformer-based serial spatiotemporal network was constructed to supplement the long-term temporal correlation information. Finally, a fusion decision was made using a dual-path spatiotemporal network to enhance the robustness of the model. Experimental results show that the proposed method achieves recognition accuracies of 99.85%, 99.94% and 98.77% on three publicly available datasets: a driver fatigue detection dataset YawDD, a driver distraction detection dataset SF-DDDD (State-Farm Distracted Driver Detection Dataset), and a the latest driver behavior recognition dataset SynDD1, respectively; especially on SynDD1, the recognition accuracy is improved by 1.64 percentage points compared to MoviNet-A0, a recognition network by motion. Ablation experimental results confirm that the proposed method has high recognition accuracy of driver behavior.