李明悦, 何乐生, 雷晨, 龚友梅. 基于注意力特征融合的SqueezeNet细粒度图像分类模型[J]. 云南大学学报(自然科学版), 2021, 43(5): 868-876. doi: 10.7540/j.ynu.20200577
引用本文: 李明悦, 何乐生, 雷晨, 龚友梅. 基于注意力特征融合的SqueezeNet细粒度图像分类模型[J]. 云南大学学报(自然科学版), 2021, 43(5): 868-876. doi: 10.7540/j.ynu.20200577
LI Ming-yue, HE Le-sheng, LEI Chen, GONG You-mei. Fine-grained image classification model based on attention feature fusion with SqueezeNet[J]. Journal of Yunnan University: Natural Sciences Edition, 2021, 43(5): 868-876. DOI: 10.7540/j.ynu.20200577
Citation: LI Ming-yue, HE Le-sheng, LEI Chen, GONG You-mei. Fine-grained image classification model based on attention feature fusion with SqueezeNet[J]. Journal of Yunnan University: Natural Sciences Edition, 2021, 43(5): 868-876. DOI: 10.7540/j.ynu.20200577

基于注意力特征融合的SqueezeNet细粒度图像分类模型

Fine-grained image classification model based on attention feature fusion with SqueezeNet

  • 摘要: 针对现有细粒度图像分类算法普遍存在的模型结构复杂、参数多、分类准确率较低等问题,提出一种注意力特征融合的SqueezeNet细粒度图像分类模型. 通过对现有细粒度图像分类算法和轻量级卷积神经网络的分析,首先使用3个典型的预训练轻量级卷积神经网络,对其微调后在公开的细粒度图像数据集上进行验证,经比较后选择了模型性能最佳的SqueezeNet作为图像的特征提取器;然后将两个具有注意力机制的卷积模块嵌入至SqueezeNet网络的每个Fire模块;接着提取出改进后的SqueezeNet的中间层特征进行双线性融合形成新的注意力特征图,与网络的全局特征再融合后分类;最后通过实验对比和可视化分析,网络嵌入Convolution Block Attention Module(CBAM)模块的分类准确率在鸟类、汽车、飞机数据集上依次提高了8.96%、4.89%和5.85%,嵌入Squeeze-and-Excitation(SE)模块的分类准确率依次提高了9.81%、4.52%和2.30%,且新模型在参数量、运行效率等方面比现有算法更具优势.

     

    Abstract: Aiming at the problems of complex model structure, many parameters and low classification accuracy in existing fine-grained image classification algorithms, a SqueezeNet fine-grained image classification model based on attention feature fusion was proposed. Based on the analysis of the existing fine-grained image classification algorithms and light-CNNs, three typical pre-trained light-CNNs were used to verify them on public fine-grained image datasets after fine-tuning. After comparison, the SqueezeNet with the best model performance was selected as the feature extractor for the image. Two convolution modules with attention mechanism were embedded into each Fire module of the Squeezenet network. The middle layer features of the improved SqueezeNet are extracted and bilinear fused to form a new attention feature graph, which was then classified by fusion with the global features of the network. Finally, through experimental comparison and visualization analysis, the accuracy of the network embedded in the Convolution Block Attention Module (CBAM) module increases by 8.96%, 4.89% and 5.85% on the data sets of birds, cars and aircraft, and the accuracy of embedded in the Squeeze-and-Excitation (SE) module increases by 9.81%, 4.52% and 2.30%, respectively. Moreover, the model in this paper has more advantages than the existing algorithm in terms of the number of parameters and operation efficiency.

     

/

返回文章
返回