Abstract:
To address the challenges of insufficient feature extraction for small objects, interference from background noise, and difficulties in precise localization within object detection, this paper proposes the YOLOv7-BAMFF model, lever-aging the multiscale feature fusion and attention mechanism. First, by incorporating the semantically rich Conv2 layer, we extract finer-grained features from lower layers, and conduct multiscale feature fusion involving cross-scale skip connections and adaptive contextual information fusion. Then, during the process of feature re-extraction and optimization, we introduce the enhanced coordinate attention mechanism to suppress complex background noise and accentuate small objects. Finally, we optimize the localization loss function to improve the precision, while add the small object detection head to improve the capacity of detection. Experimental results on the PASCAL VOC and VisDrone2019 datasets demonstrate that our approach achieves an average detection accuracy improvement from the baseline method YOLOv7 of 82.1% and 43.8% to 85.4% and 50.4%, respectively, which outperforms other main-stream methods.