Attention Deficit Hyperactivity Disorder (ADHD) is a neurodevelopmental disorder common in childhood, characterized by inattention, hyperactivity, and impulsivity, often exhibiting specific motion patterns. Traditional action recognition algorithms have problems such as low recognition accuracy and slow response when handling these specific actions. To address these issues, an action recognition algorithm for ADHD patients using skeleton and 3D heatmap was proposed, and spatial relationships between joints were represented using Gaussian distribution precisely, which preserved spatio-temporal information effectively. To overcome the limitations of single-modal data, a multimodal integration method based on skeleton and 3D heatmap was introduced. At the same time, the output features of Short 3D-CNN (3D Convolutional Neural Network) and Adaptive Graph Convolutional Network (AGCN) were fused to fully exploit the advantages of both modalities, thereby improving action recognition performance. Experimental results on the ADHD patient dataset collected by Mental Health Center of West China Hospital, Sichuan University, show that the proposed algorithm achieves the Top-1 recognition accuracy of 0.860 4 and the Top-5 recognition accuracy of 0.987 3 for eight different types of actions. Additionally, an automatic ADHD classification algorithm based on action types was proposed, which classified ADHD into head and facial action type, trunk action type, and limb action type, achieving the recognition accuracy of 75% and the response time of 5 seconds. Compared with two-stream AGCN (2s-AGCN) and PoseConv3D, the proposed algorithm demonstrates higher recognition accuracy in complex action scenarios, providing a new technical approach for personalized ADHD intervention.