Abstract:
To address the problems of missed detection, false detection and complex background interference faced by small target detection in the high-altitude perspective of unmanned aerial vehicles (UAVs), a small target detection model CSD-YOLO based on the fusion of attention mechanism and shallow features is proposed. Firstly, in the original backbone network, the optimized C3K2 module is utilized to enhance multi-scale feature extraction and fusion, and the AS module is designed to enrich the gradient flow information, thereby improving the detection effect of multi-scale targets. Secondly, the structure of the neck network is reconstructed. A shallow feature fusion module is introduced and its tail end is redesigned to achieve cross-scale feature calibration at the head and tail. While strengthening the attention to the underlying feature map and compensating for the feature loss of small targets during the deep propagation process, the integrity of the residual spatial information of the occluded targets is guaranteed. Finally, scale, space and task perception mechanisms are embedded at the detection output end. By dynamically adjusting the detection strategy from multiple dimensions, the model's adaptive ability to target deformation and scale changes is significantly enhanced. The experiments were conducted on two public datasets, VisDrone2019 and TinyPerson. The results show that CSD-YOLO achieved mAP
50 metrics of 41.8% and 29.8% respectively, which are 9.4 and 2.2 percentage points higher than those of the YOLOv11 model, and the overall model complexity is lower than that of the current mainstream detection algorithms.