Due to objective factors such as small inherent features and the depth of the network causing feature loss, the detection of small objects is always a challenging issue in the field of object detection. To address the above issues, a model for optimizing the detection of small objects was proposed based on multiple feature enhancements based on the network structure. Firstly, the optimization of gradient calculation was achieved by replacing Spatial Pyramid Pooling (SPP) in the backbone network. Secondly, a multi-level bidirectional fusion at the feature level and the addition of Adaptive Feature Fusion (AFF) module to the output head were employed in the network neck to achieve multi-level feature enhancement. Experimental results show that on COCO2017-val dataset, when the IoU (Intersection over Union) is 0.5, the average precision of the proposed model reaches 61.4%, which is 4.7 percentage points higher than that of the currently popular YOLOv7 model. At the same time, the detection frame rate of the proposed model with a single GPU is 78.2 frame/s, which is in line with industrial level detection speed.