Fine-grained image recognition based on mid-level subtle feature extraction and multi-scale feature fusion

doi:10.11772/j.issn.1001-9081.2022071090

Abstract

Abstract:

In the field of fine-grained visual recognition， due to subtle differences between highly similar categories， precise extraction of subtle image features has a crucial impact on recognition accuracy. It has become a trend for the existing related hot research algorithms to use attention mechanism to extract categorical features， however， these algorithms ignore the subtle but distinguishable features， and isolate the feature relationships between different discriminative regions of objects. Aiming at these problems， a fine-grained image recognition algorithm based on mid-level subtle feature extraction and multi-scale feature fusion was proposed. First， the salient features of image were extracted by using the weight variance measures of channel and position information fused mid-level features. Then， the mask matrix was obtained through the channel average pooling to suppress salient features and enhance the extraction of subtle features in other discriminative regions. Finally， channel weight information and pixel complementary information were used to obtain multi-scale fusion features of channels and pixels to enhance the diversity and richness of different discriminative regional features. Experimental results show that the proposed algorithm achieves 89.52% Top-1 accuracy and 98.46% Top-5 accuracy on dataset CUB-200-211， and 94.64% Top-1 accuracy and 98.62% Top-5 accuracy on dataset Stanford Cars， and 93.20% Top-1 accuracy and 97.98% Top-5 accuracy on dataset Fine-Grained Visual Classification of Aircraft （FGVC-Aircraft）. Compared with recurrent collaborative attention feature learning network PCA-Net （Progressive Co-Attention Network） algorithm， the proposed algorithm has the Top-1 accuracy increased by 1.22， 0.34 and 0.80 percentage points respectively， and the Top-5 accuracy increased by 1.03， 0.88 and 1.12 percentage points respectively.

Key words: fine-grained image recognition, attention mechanism, weight variance, mask matrix, multi-scale fusion, mid-level feature

摘要：

在细粒度视觉识别领域，由于高度近似的类别之间差异细微，图像细微特征的精确提取对识别的准确率有着至关重要的影响。现有的相关热点研究算法中使用注意力机制提取类别特征已经成为一种趋势，然而这些算法忽略了不明显但可区分的细微部分特征，并且孤立了对象不同判别性区域之间的特征关系。针对这些问题，提出了基于中层细微特征提取与多尺度特征融合的图像细粒度识别算法。首先，利用通道与位置信息融合中层特征的权重方差度量提取图像显著特征，之后通过通道平均池化获得掩码矩阵抑制显著特征，并增强其他判别性区域细微特征的提取；然后，通过通道权重信息与像素互补信息获得通道与像素多尺度融合特征，以增强不同判别性区域特征的多样性与丰富性。实验结果表明，所提算法在数据集CUB-200-2011上达到89.52%的Top-1准确率、98.46%的Top-5准确率；在Stanford Cars数据集上达到94.64%的Top-1准确率、98.62%的Top-5准确率；在飞行器细粒度分类（FGVC-Aircraft）数据集上达到93.20%的Top-1准确率、97.98%的Top-5准确率。与循环协同注意力特征学习网络PCA-Net （Progressive Co-Attention Network）算法相比，所提算法的Top-1准确率分别提升了1.22、0.34和0.80个百分点，Top-5准确率分别提升了1.03、0.88和1.12个百分点。

关键词: 细粒度图像识别, 注意力机制, 权重方差, 掩码矩阵, 多尺度融合, 中层特征

CLC Number:

TP391.4

Ailing QI, Xuanlin WANG. Fine-grained image recognition based on mid-level subtle feature extraction and multi-scale feature fusion[J]. Journal of Computer Applications, 2023, 43(8): 2556-2563.

齐爱玲, 王宣淋. 基于中层细微特征提取与多尺度特征融合细粒度图像识别[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2556-2563.

Figures/Tables 12

Fig. 1 Overall network structure

Fig. 2 Structure of CPFDEN

Fig. 3 Structure of CSMFN

Fig. 4 Residual block structure in ResNet

Tab. 1 Statistics of three fine-grained datasets

数据集	名字	类别数	样本数
数据集	名字	类别数	训练集	测试集
CUB-200-2011	Bird	200	5 994	5 794
Stanford Cars	Car	196	8 144	8 041
FGVC-Aircraft	Aircraft	100	6 667	3 333

Fig. 5 Examples from datasets

Tab. 2 Top-1 Accuracy of different ? values on datasets

$ϕ$	CUB-200-2011	Stanford Cars	FGVC-Aircraft
0.5	87.64	91.10	90.94
0.6	88.10	92.54	91.46
0.7	88.94	93.36	93.20
0.8	89.52	94.64	92.79
0.9	89.18	93.87	92.56

Tab. 2 Top-1 Accuracy of different ? values on datasets

$ϕ$	CUB-200-2011	Stanford Cars	FGVC-Aircraft
0.5	87.64	91.10	90.94
0.6	88.10	92.54	91.46
0.7	88.94	93.36	93.20
0.8	89.52	94.64	92.79
0.9	89.18	93.87	92.56

Fig. 6 Training process of the proposed algorithm on each dataset

Tab. 3 Results of ablation experiments on three datasets

算法	CUB-200-2011		Stanford Cars		FGVC Aircraft
算法	Top-1	Top-5	Top-1	Top-5	Top-1	Top-5
ResNet	85.50	92.54	89.80	94.63	90.30	94.41
ResNet-CPFDEN	88.94	96.44	93.40	97.82	92.60	96.83
Resnet-CPFDEN-CSMFN	89.52	98.46	94.64	98.62	93.20	97.98

Fig. 7 Effect comparison of heatmaps by two algorithms

Tab. 4 Comparison of Top-1 classification accuracy of different algorithms on three datasets

算法	CUB-200-2011	Stanford Cars	FGVC-Aircraft
DCL-Net	87.40	93.10	91.70
TPA-CNN	88.00	94.00	91.70
ACB-Net	88.10	94.60	92.40
本文算法	89.52	94.64	93.20

Tab. 5 Comparison of accuracy of the proposed algorithm with PPL-Net and PCA-Net algorithms on three datasets

算法	CUB-200-2011		Stanford Cars		FGVC-Aircraft
算法	Top-1	Top-5	Top-1	Top-5	Top-1	Top-5
PPL-Net	88.30	—	94.00	—	92.60	—
PCA-Net	88.30	97.43	94.30	97.74	92.40	96.86
本文算法	89.52	98.46	94.64	98.62	93.20	97.98

References 22

1	马瑶，智敏，殷雁君，等. CNN和Transformer在细粒度图像识别中的应用综述［J］. 计算机工程与应用， 2022， 58（19）：53-63. 10.3778/j.issn.1002-8331.2201-0374
	MA Y， ZHI M， YIN Y J， et al. Review of applications of CNN and Transformer in fine-grained image recognition［J］. Computer Engineering and Applications， 2022， 58（19）：53-63. 10.3778/j.issn.1002-8331.2201-0374
2	WEI X S， XIE C W， WU J X， et al. Mask-CNN： localizing parts and selecting descriptors for fine-grained bird species categorization［J］. Pattern Recognition， 2018， 76：704-714. 10.1016/j.patcog.2017.10.002
3	ZHANG N， DONAHUE J， GIRSHICK R， et al. Part-based R-CNNs for fine-grained category detection［C］// Proceedings of the 2014 European Conference on Computer Vision， LNCS 8689. Cham： Springer， 2014：834-849.
4	ZHANG X F， LIN W S， HUANG Q M. Fine-grained image quality assessment： a revisit and further thinking［J］. IEEE Transactions on Circuits and Systems for Video Technology， 2022， 32（5）：2746-2759. 10.1109/tcsvt.2021.3096528
5	CHEN Y， BAI Y L， ZHANG W， et al. Destruction and construction learning for fine-grained image recognition［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019：5152-5161. 10.1109/cvpr.2019.00530
6	YAN T T， WANG S J， WANG Z H， et al. Progressive learning for weakly supervised fine-grained classification［J］. Signal Processing， 2020， 171： No.107519. 10.1016/j.sigpro.2020.107519
7	ZHANG T， CHANG D L， MA Z Y， et al. Progressive co-attention network for fine-grained visual classification［C］// Proceedings of the 2021 International Conference on Visual Communications and Image Processing. Piscataway： IEEE， 2021：1-5. 10.1109/vcip53242.2021.9675376
8	ZHAO Y F， YAN K， HUANG F Y， et al. Graph-based high order relation discovery for fine-grained recognition［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 15074-15083. 10.1109/cvpr46437.2021.01483
9	WEI H， ZHU M， WANG B， et al. Two-level progressive attention convolutional network for fine-grained image recognition［J］. IEEE Access， 2020， 8：104985-104995. 10.1109/access.2020.2999722
10	东南大学. 一种基于多尺度特征融合的图像细粒度识别方法： 201910282865.4［P］. 2019-08-06.
	Southeast University. A fine-grained image recognition method based on multi-scale feature fusion： 201910282865.4［P］. 2019-08-06.
11	JI R Y， WEN L Y， ZHANG L B， et al. Attention convolutional binary neural tree for fine-grained visual categorization［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020：10465-10474. 10.1109/cvpr42600.2020.01048
12	YAN T T， SHI J， LI H J. Discriminative information restoration and extraction for weakly supervised low-resolution fine-grained image recognition［J］. Pattern Recognition， 2022， 127： No.108629. 10.1016/j.patcog.2022.108629
13	CAO S Y， WANG W， ZHANG J， et al. A few-shot fine-grained image classification method leveraging global and local structures［J］. International Journal of Machine Learning and Cybernetics， 2022， 13（8）：2273-2281. 10.1007/s13042-022-01522-w
14	WANG L， HE K， FENG X， et al. Multilayer feature fusion with parallel convolutional block for fine-grained image classification［J］. Applied Intelligence， 2022， 52（3）：2872-2883. 10.1007/s10489-021-02573-2
15	HE K M， ZHANG X Y， REN S Q， et al. Deep residual learning for image recognition［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016：770-778. 10.1109/cvpr.2016.90
16	HU J， SHEN L， SUN G. Squeeze-and-excitation networks［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018：7132-7141. 10.1109/cvpr.2018.00745
17	WOO S， PARK J， LEE J Y， et al. CBAM： convolutional block attention module［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11211. Cham： Springer， 2018： 3-19.
18	WAH C， BRANSON S， WELINDER P， et al. The Caltech-UCSD Birds-200-2011 dataset［EB/OL］. ［2020-07-05］..
19	KRAUSE J， STARK M， DENG J， et al. 3D object representations for fine-grained categorization［C］// Proceedings of the 2013 IEEE International Conference on Computer Vision Workshops. Piscataway： IEEE， 2013： 554-561. 10.1109/iccvw.2013.77
20	MAJI S， RAHTU E， KANNALA J， et al. Fine-grained visual classification of aircraft［EB/OL］. （2013-06-21）［2020-07-05］..
21	LI P H， XIE J T， WANG Q L， et al. Towards faster training of global covariance pooling networks by iterative matrix square root normalization［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 947-955. 10.1109/cvpr.2018.00105
22	LERMA M， LUCAS M. Grad-CAM++ is equivalent to Grad-CAM with positive gradients［C/OL］// Proceedings of the 24th Irish Machine Vision and Image Processing Conference ［2022-05-22］.. 10.56541/awjv6348

[1]	Yumeng CUI, Jingya WANG, Xiaowen LIU, Shangyi YAN, Zhizhong TAO. General text classification model combining attention and cropping mechanism [J]. Journal of Computer Applications, 2023, 43(8): 2396-2405.
[2]	Zexi JIN, Lei LI, Ji LIU. Transfer learning model based on improved domain separation network [J]. Journal of Computer Applications, 2023, 43(8): 2382-2389.
[3]	Yuan LIU, Yongquan DONG, Rui JIA, Haolin YANG. Hierarchical and phased attention network model for personalized course recommendation [J]. Journal of Computer Applications, 2023, 43(8): 2358-2363.
[4]	Jinghong WANG, Zhixia ZHOU, Hui WANG, Haokang LI. Attribute network representation learning with dual auto-encoder [J]. Journal of Computer Applications, 2023, 43(8): 2338-2344.
[5]	Min LIANG, Jiayi LIU, Jie LI. Image super-resolution reconstruction method based on iterative feedback and attention mechanism [J]. Journal of Computer Applications, 2023, 43(7): 2280-2287.
[6]	Kunpei YE, Xi XIONG, Zhe DING. Recruitment recommendation model based on field fusion and time weight [J]. Journal of Computer Applications, 2023, 43(7): 2133-2139.
[7]	Shuai ZHENG, Xiaolong ZHANG, He DENG, Hongwei REN. 3D liver image segmentation method based on multi-scale feature fusion and grid attention mechanism [J]. Journal of Computer Applications, 2023, 43(7): 2303-2310.
[8]	Yuxin TUO, Tao XUE. Joint triple extraction model combining pointer network and relational embedding [J]. Journal of Computer Applications, 2023, 43(7): 2116-2124.
[9]	Yuanyuan QIN, Hong ZHANG. Pulmonary nodule detection algorithm based on attention feature pyramid networks [J]. Journal of Computer Applications, 2023, 43(7): 2311-2318.
[10]	Yuan WEI, Yan LIN, Shengnan GUO, Youfang LIN, Huaiyu WAN. Prediction of taxi demands between urban regions by fusing origin-destination spatial-temporal correlation [J]. Journal of Computer Applications, 2023, 43(7): 2100-2106.
[11]	Zhongyu LI, Haodong SUN, Jiao LI. Lightweight gesture recognition algorithm for basketball referee [J]. Journal of Computer Applications, 2023, 43(7): 2173-2181.
[12]	Huibin ZHANG, Liping FENG, Yaojun HAO, Yining WANG. Ancient mural dynasty identification based on attention mechanism and transfer learning [J]. Journal of Computer Applications, 2023, 43(6): 1826-1832.
[13]	Zhixiong ZHENG, Jianhua LIU, Shuihua SUN, Ge XU, Honghui LIN. Aspect-based sentiment analysis model fused with multi-window local information [J]. Journal of Computer Applications, 2023, 43(6): 1796-1802.
[14]	Hui WANG, Jianhong LI. Few-shot recognition method of 3D models based on Transformer [J]. Journal of Computer Applications, 2023, 43(6): 1750-1758.
[15]	Ke FANG, Rong LIU, Chiyu WEI, Xinyue ZHANG, Yang LIU. Pedestrian fall detection algorithm in complex scenes [J]. Journal of Computer Applications, 2023, 43(6): 1811-1817.