1.School of Computer Science and Technology, Wuhan University of Science and Technology, WuhanHubei 430065, China
2.Institute of Big Data Science and Engineering, Wuhan University of Science and Technology, WuhanHubei 430065, China
3.Key Laboratory of Hubei Province for Intelligent Information Processing and Real-time Industrial System;(Wuhan University of Science and Technology), WuhanHubei 430065, China
4.School of Information Science and Engineering, Wuhan University of Science and Technology, WuhanHubei 430081, China
5.Engineering Research Center for Metallurgical Automation and Measurement Technology, Ministry of Education;(Wuhan University of Science and Technology), WuhanHubei 430081, China
Concerning the problem that the local feature and rotation invariant feature cannot be jointly paid attention to in traditional attention-based neural networks, a multi-branch neural network model based weakly supervised fine-grained image classification method was proposed. Firstly, the lightweight Class Activation Map (CAM) network was utilized to localize the local region with potential semantic information, and the residual network ResNet-50 with deformable convolution and Oriented Response Network (ORN) with rotation invariant coding were designed. Secondly, the pre-trained model was employed to initialize the feature networks respectively, and the original image and the above regions were input to fine-tune the model. Finally, the three intra-branch losses and between-branch losses were combined to optimize the entire network, and the classification and prediction were performed on the test set. The proposed method achieves the classification accuracies of 87.7% and 90.8% on CUB-200-2011 dataset and FGVC_Aircraft dataset respectively, which are increased by 1.2 percentage points, and 0.9 percentage points respectively compared with those of the Multi-Attention Convolutional Neural Network (MA-CNN) method. On Aircraft_2 dataset, the proposed method reaches 91.8% classification accuracy, which is 4.1 percentage points higher than that of ResNet-50. The experimental results show that the proposed method improves the accuracy of weakly supervised fine-grained image classification effectively.
1 罗建豪,吴建鑫 . 基于深度卷积特征的细粒度图像分类研究综述[J]. 自动化学报, 2017, 43(8):1306-1318. (LUO J H, WU J X. A survey on fine-grained image categorization using deep convolutional features[J]. Acta Automatica Sinica, 2017, 43(8): 1306-1318.)
2 XIAO T , XU Y , YANG K , et al . The application of two-level attention models in deep convolutional neural network for fine-grained image classification[C]// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2015: 842-850.
3 LIN T Y , ROYCHOWDHURY A , MAJI S . Bilinear CNN models for fine-grained visual recognition[C]// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2015: 1449-1457.
4 ZHANG X , XIONG H , ZHOU W , et al . Picking neural activations for fine-grained recognition[J]. IEEE Transactions on Multimedia, 2017, 19(12): 2736-2750.
5 WANG Y , MORARIU V I , DAVIS L S . Learning a discriminative filter bank within a CNN for fine-grained recognition[C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 4148-4157.
6 王永雄,张晓兵 . 聚焦—识别网络架构的细粒度图像分类[J]. 中国图象图形学报, 2019, 24(4): 493-502. (WANG Y X, ZHANG X B. Fine-grained image classification with network architecture of focus and recognition[J]. Journal of Image and Graphics, 2019, 24(4): 493-502.)
7 YANG Z , LUO T , WANG D , et al . Learning to navigate for fine-grained classification[C]// Proceedings of the 2018 European Conference on Computer Vision, LNCS 11218. Cham: Springer, 2018: 438-454.
8 ZHAO B , WU X , FENG J , et al . Diversified visual attention networks for fine-grained object classification[J]. IEEE Transactions on Multimedia, 2017, 19(6): 1245-1256.
9 SUN M , YUAN Y , ZHOU F , et al . Multi-attention multi-class constraint for fine-grained image recognition[C]// Proceedings of the 2018 European Conference on Computer Vision, LNCS 11220. Cham: Springer, 2018: 834-850.
10 王培森,宋彦,戴礼荣 . 基于多通道视觉注意力的细粒度图像分类[J]. 数据采集与处理, 2019, 34(1): 157-166. (WANG P S, SONG Y, DAI L R. Fine-grained image classification with multi-channel visual attention[J]. Journal of Data Acquisition and Processing, 2019, 34(1): 157-166.)
11 边小勇,费雄君,穆楠 .基于尺度注意力网络的遥感图像场景分类[J]. 计算机应用, 2020, 40(3): 872-877.(BIAN X Y, FEI X J, MU N. Remote sensing image scene classification based on scale-attention network[J]. Journal of Computer Applications, 2020, 40(3): 872-877.)
12 ZHOU B , KHOSLA A , LAPEDRIZA A , et al . Learning deep features for discriminative localization[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 2921-2929.
13 ZHOU Y , YE Q , QIU Q , et al . Oriented response networks[C]// Proceedings of the 2017 International Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 4961-4970.
14 WANG J , LIU W , MA L , et al . IORN: an effective remote sensing image scene classification framework[J]. IEEE Geoscience and Remote Sensing Letters, 2018, 15(11): 1695-1699.
15 LIU X , XIA T , WANG J , et al . Fully convolutional attention networks for fine-grained recognition[C]// Proceedings of the 2016 International Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 1-9.
16 FU J , ZHENG H , MEI T . Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 4476-4484.
17 ZHENG H , FU J , MEI T , et al . Learning multi-attention convolutional neural network for fine-grained image recognition[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 5219-5227.
18 SERMANET P , FROME A , REAL E . Attention for fine-grained categorization[EB/OL]. [2019-03-13]. https://arxiv.org/pdf/1412.7054.pdf.
19 HAN J , YAO X , CHENG G , et al . P-CNN: part-based convolutional neural networks for fine-grained visual categorization[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019(Early Access): 1.
20 DAI J , QI H , XIONG Y , et al . Deformable convolutional networks[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 764-773.
21 ZHU X , HU H , LIN S , et al. Deformable ConvNets V 2: more deformable, better results[C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018:9300-9308.
22 WAH C, BRANSON S , WELINDER P , et al . The Caltech-UCSD birds-200-2011 dataset[R]. Pasadena, CA: California Institute of Technology, 2011.
23 MAJI S , RAHTU E , KANNALA J , et al . Fine-grained visual classification of aircraft[EB/OL]. [2019-11-05].https://arxiv.org/pdf/1306.5151v1.pdf.
24 ZHANG N , DONAHUE J , GIRSHICK R , et al . Part-based R-CNNs for fine-grained category detection[C]// Proceedings of the 2014 European Conference on Computer Vision, LNCS 8689. Cham: Springer, 2014:834-849.
25 WANG D , SHEN Z , SHAO J , et al . Multiple granularity descriptors for fine-grained categorization[C]// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2015: 2399-2406.
26 WANG Y , CHOI J , MORARIU V I , et al . Mining discriminative triplets of patches for fine-grained classification[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 1163-1172.
27 GOSSELIN P H , MURRAY N , JéGOU H , et al . Revisiting the fisher vector for fine-grained classification[J]. Pattern Recognition Letters, 2014, 49: 92-98.