Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (8): 2556-2563.DOI: 10.11772/j.issn.1001-9081.2022071090

• Multimedia computing and computer simulation • Previous Articles    

Fine-grained image recognition based on mid-level subtle feature extraction and multi-scale feature fusion

Ailing QI, Xuanlin WANG()   

  1. College of Computer Science and Technology,Xi’an University of Science and Technology,Xi’an Shaanxi 710600,China
  • Received:2022-07-27 Revised:2022-11-03 Accepted:2022-11-07 Online:2023-01-15 Published:2023-08-10
  • Contact: Xuanlin WANG
  • About author:QI Ailing, born in 1972, Ph. D., associate professor. Her research interests include artificial intelligence, digital image processing.
  • Supported by:
    National Natural Science Foundation of China(61674121)

基于中层细微特征提取与多尺度特征融合细粒度图像识别

齐爱玲, 王宣淋()   

  1. 西安科技大学 计算机科学与技术学院,西安 710600
  • 通讯作者: 王宣淋
  • 作者简介:齐爱玲(1972—),女,陕西西安人,副教授,博士,主要研究方向:人工智能、数字图像处理;
  • 基金资助:
    国家自然科学基金资助项目(61674121)

Abstract:

In the field of fine-grained visual recognition, due to subtle differences between highly similar categories, precise extraction of subtle image features has a crucial impact on recognition accuracy. It has become a trend for the existing related hot research algorithms to use attention mechanism to extract categorical features, however, these algorithms ignore the subtle but distinguishable features, and isolate the feature relationships between different discriminative regions of objects. Aiming at these problems, a fine-grained image recognition algorithm based on mid-level subtle feature extraction and multi-scale feature fusion was proposed. First, the salient features of image were extracted by using the weight variance measures of channel and position information fused mid-level features. Then, the mask matrix was obtained through the channel average pooling to suppress salient features and enhance the extraction of subtle features in other discriminative regions. Finally, channel weight information and pixel complementary information were used to obtain multi-scale fusion features of channels and pixels to enhance the diversity and richness of different discriminative regional features. Experimental results show that the proposed algorithm achieves 89.52% Top-1 accuracy and 98.46% Top-5 accuracy on dataset CUB-200-211, and 94.64% Top-1 accuracy and 98.62% Top-5 accuracy on dataset Stanford Cars, and 93.20% Top-1 accuracy and 97.98% Top-5 accuracy on dataset Fine-Grained Visual Classification of Aircraft (FGVC-Aircraft). Compared with recurrent collaborative attention feature learning network PCA-Net (Progressive Co-Attention Network) algorithm, the proposed algorithm has the Top-1 accuracy increased by 1.22, 0.34 and 0.80 percentage points respectively, and the Top-5 accuracy increased by 1.03, 0.88 and 1.12 percentage points respectively.

Key words: fine-grained image recognition, attention mechanism, weight variance, mask matrix, multi-scale fusion, mid-level feature

摘要:

在细粒度视觉识别领域,由于高度近似的类别之间差异细微,图像细微特征的精确提取对识别的准确率有着至关重要的影响。现有的相关热点研究算法中使用注意力机制提取类别特征已经成为一种趋势,然而这些算法忽略了不明显但可区分的细微部分特征,并且孤立了对象不同判别性区域之间的特征关系。针对这些问题,提出了基于中层细微特征提取与多尺度特征融合的图像细粒度识别算法。首先,利用通道与位置信息融合中层特征的权重方差度量提取图像显著特征,之后通过通道平均池化获得掩码矩阵抑制显著特征,并增强其他判别性区域细微特征的提取;然后,通过通道权重信息与像素互补信息获得通道与像素多尺度融合特征,以增强不同判别性区域特征的多样性与丰富性。实验结果表明,所提算法在数据集CUB-200-2011上达到89.52%的Top-1准确率、98.46%的Top-5准确率;在Stanford Cars数据集上达到94.64%的Top-1准确率、98.62%的Top-5准确率;在飞行器细粒度分类(FGVC-Aircraft)数据集上达到93.20%的Top-1准确率、97.98%的Top-5准确率。与循环协同注意力特征学习网络PCA-Net (Progressive Co-Attention Network)算法相比,所提算法的Top-1准确率分别提升了1.22、0.34和0.80个百分点,Top-5准确率分别提升了1.03、0.88和1.12个百分点。

关键词: 细粒度图像识别, 注意力机制, 权重方差, 掩码矩阵, 多尺度融合, 中层特征

CLC Number: