Fine-grained image classification method based on multi-feature combination
ZOU Chengming1,2, LUO Ying1,2, XU Xiaolong1,2
1. Hubei Key Laboratory of Transportation Internet of Things(Wuhan University of Technology), Wuhan Hubei 430070, China; 2. College of Computer Science and Technology, Wuhan University of Technology, Wuhan Hubei 430070, China
Abstract:As the limitation of single feature representation may cause low accuracy of fine-grained image classification, a multi-feature combination representation method based on Convolutional Neural Network (CNN) and Scale Invariant Feature Transform (SIFT) was proposed. The features were extracted from the entire target, the key parts and the key points comprehensively. Firstly, two CNN models were trained with the target-entirety regions and the head-only regions in the fine-grained image library respectively, which were used to extract the target-entirety and the head-only CNN features. Secondly, the SIFT key points were extracted from all the target-entirety regions in the image library, and the codebook was generated through the K-means clustering. Then, the SIFT descriptors of each target-entirety region were encoded into a feature vector by using the Vector of Locally Aggregated Descriptors (VLAD) along with the codebook. Finally, Support Vector Machine (SVM) was used to classify the fine-grained images by using the combination of multiple features. The method was evaluated in CUB-200-2011 database and compared with the single feature representation method. The experimental results show that the proposed method can improve the classification accuracy by 13.31% compared with the single CNN feature representation, which proves the positive effect of multi-feature combination on fine-grained image classification.
[1] 罗建豪,吴建鑫.基于深度卷积特征的细粒度图像分类研究综述[J].自动化学报,2017,43(8):1306-1318.(LUO J H, WU J X. A survey on fine-grained image categorization using deep convolutional features[J]. Acta Automatica Sinica, 2017, 43(8):1306-1318.) [2] 冯语姗,王子磊.自上而下注意图分割的细粒度图像分类[J].中国图象图形学报,2016,21(9):1147-1154.(FENG Y S, WANG Z L. Fine-grained image categorization with segmentation based on top-down attention map[J]. Journal of Image and Graphics, 2016, 21(9):1147-1154.) [3] WAH C, BRANSON S, WELINDER P, et al. The Caltech-UCSD Birds-200-2011 dataset, CNS-TR-2011-001[R]. Pasadena, CA:California Institute of Technology, 2011. [4] LOWE D G. Object recognition from local scale-invariant features[C]//ICCV 1999:Proceedings of the 7th IEEE International Conference on Computer Vision. Piscataway, NJ:IEEE, 1999:1150. [5] SANCHEZ J, PERRONNIN F, MENSINK T. Image classification with the fisher vector:theory and practice[J]. International Journal of Computer Vision, 2013, 105(3):222-245. [6] JEGOU H, DOUZE M, SCHMID C, et al. Aggregating local descriptors into a compact image representation[C]//CVPR 2010:Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2010:3304-3311. [7] LECUN Y, BOTTOU L, BENGIO Y. Gradient-based learning applied to document recognition[J]. Proceeding of the IEEE, 1998, 86(11):2278-2324. [8] KRIZHEVSK A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Advances in Neural Information Processing Systems, 2012, 25:1106-1114. [9] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[C/OL]. ICLR 2015:Proceedings of the 2015 International Conference on Learning Representations. San Diego, CA.[2017-09-12]. https://arxiv.org/abs/1409.1556. [10] SZEGEDY C, LIU W, JIA Y, et al. Going deeper with convolutions[C]//CVPR 2015:Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2015:1-9. [11] ZHAO B, FENG J, WU X, et al. A survey on deep learning-based fine-grained object classification and semantic segmentation[J]. International Journal of Automation and Computing, 2017, 14(2):119-135. [12] ZHANG N, DONAHUE J, GIRSHICK R, et al. Part-based R-CNNs for fine-grained category detection[C]//ECCV 2015:Proceedings of the 2015 European Conference on Computer Vision. Piscataway, NJ:IEEE, 2015:1143-1151. [13] BRANSON S, VAN HORN G, BELONGIE S, et al. Bird species categorization using pose normalized deep convolutional nets[C/OL]. BMVC 2014:Proceedings of the 2014 British Machine Vision Conference. Nottingham, UK.[2017-09-15]. https://arxiv.org/abs/1406.2952. [14] LECUN Y, BOSER B, DENKER J S. Back propagation applied to handwritten zip code recognition[J]. Neural Computation, 1989, 1(4):541-551. [15] 李彦冬,郝宗波,雷航.卷积神经网络研究综述[J].计算机应用,2016,36(9):2508-2515.(LI Y D, HAO Z B, LEI H. Survey of convolutional neural network[J]. Journal of Computer Applications, 2016, 36(9):2508-2515.)