Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Small target detection method in UAV images based on fusion of dilated convolution and Transformer
Lin WANG, Jingliang LIU, Wuwei WANG
Journal of Computer Applications    2024, 44 (11): 3595-3602.   DOI: 10.11772/j.issn.1001-9081.2023111575
Abstract331)   HTML6)    PDF (1433KB)(293)       Save

A multi-scale dilated convolution based Unmanned Aerial Vehicle (UAV) image target detection algorithm Swin-Det was proposed to address the issues of complex target scenes, diverse scales of targets, dense small targets and severe occlusion of targets in UAV aerial images. Firstly, Swin Transformer was used as the backbone feature extraction network, and a Spatial Information Blending Module (SIBM) was introduced into the backbone network to solve the problem of fuzziness in target information due to occlusion between objects. Secondly, a Fusion of Dilation Feature Pyramid Network (FDFPN) was proposed to fuse feature information through multi-branch dilated convolution, thereby effectively improving the receptive field of the network and the reuse of feature information, so that the model was able to learn detailed features of different dimensions. Finally, the issues of mismatches in the prediction area and sample imbalance were addressed by using linear interpolation method and multi-task loss function, thereby improving the detection precision of the model. Experimental results on VisDrone dataset show that the Swin-Det algorithm reaches a mean Average Precision (mAP) of 27.2%, which is 4.1 percentage points higher than that of the original Swin Transformer, and converges faster under the same training batch. It can be seen tha the Swin-Det algorithm can achieve high-precision detection of UAV image targets in complex scenes.

Table and Figures | Reference | Related Articles | Metrics