Abstract:
Aiming at the problems of complex model structure, many parameters and low classification accuracy in existing fine-grained image classification algorithms, a SqueezeNet fine-grained image classification model based on attention feature fusion was proposed. Based on the analysis of the existing fine-grained image classification algorithms and light-CNNs, three typical pre-trained light-CNNs were used to verify them on public fine-grained image datasets after fine-tuning. After comparison, the SqueezeNet with the best model performance was selected as the feature extractor for the image. Two convolution modules with attention mechanism were embedded into each Fire module of the Squeezenet network. The middle layer features of the improved SqueezeNet are extracted and bilinear fused to form a new attention feature graph, which was then classified by fusion with the global features of the network. Finally, through experimental comparison and visualization analysis, the accuracy of the network embedded in the Convolution Block Attention Module (CBAM) module increases by 8.96%, 4.89% and 5.85% on the data sets of birds, cars and aircraft, and the accuracy of embedded in the Squeeze-and-Excitation (SE) module increases by 9.81%, 4.52% and 2.30%, respectively. Moreover, the model in this paper has more advantages than the existing algorithm in terms of the number of parameters and operation efficiency.