To address challenges in underwater small object detection, such as light scattering, low contrast, and complex background, an underwater small object detection algorithm named CSAF-YOLO (Cross-Scale Adaptive Fusion YOLO) was proposed based on YOLO11. Firstly, a Multi-Scale Collaborative Fusion (MSCF) module was designed to enhance cross-scale feature synergy and contextual information extraction through spatial fusion and channel interaction mechanisms. Secondly, a Dynamic Kernel Scale Modulation (DKSM) module was constructed to adaptively generate local and global modulation matrices, optimizing convolutional kernels for improved adaptability to complex underwater environments. Thirdly, a Multi-Scale Enhanced detection Head (MSE-Head) was proposed to improve small-object localization accuracy via scale-aware enhancement and dynamic cross-scale feature fusion. Finally, the MPDIoU (Modified Penalized Distance Intersection over Union) loss function was introduced to optimize bounding box regression for underwater small objects through minimum point distance and multi-scale penalty mechanisms. Experimental results on the URPC2020 dataset demonstrate that CSAF-YOLO achieves an mAP50 (mean Average Precision at 50% Intersection over Union (IoU) threshold) of 85.0%, representing an improvement of 1.6 percentage points over YOLO11. The proposed algorithm provides an effective solution for visual tasks in fields such as marine resource exploration and underwater robotic navigation.