Abstract:
Aiming at the problems that object detection in intelligent driving scenarios is faced with, such as a high missing detection rate of small objects and the difficulty in achieving a collaborative balance between model computational efficiency and detection accuracy, a lightweight multi-scale detection framework RepID-YOLO is proposed. Firstly, large-kernel depthwise separable convolution is used to construct the RepID feature extraction unit. On this basis, a convolutional additive self-attention mechanism is introduced to form the RepIDatt module, which reduces the number of parameters while enhancing the multi-scale feature representation capability. Secondly, the SPD spatial pyramid downsampling strategy is introduced to effectively retain the feature details of small objects and reduce their missing detection rate. Finally, a collaborative strategy of weighted feature fusion and SimAM attention mechanism is designed to realize the dynamic screening and enhancement of contextual features, thus improving the adaptability to complex scenarios. Experimental results show that RepID-YOLO achieves a detection accuracy of 91.6% mAP@0.5 on the KITTI dataset, which is superior to mainstream YOLO series algorithms such as YOLOv11. Meanwhile, it improves the small object detection capability, and ensures the inference speed while reducing the number of parameters and computational complexity.