Review of fine-grained image categorization

doi:10.11772/j.issn.1001-9081.2021122090

Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (1): 51-60.DOI: 10.11772/j.issn.1001-9081.2021122090

Special Issue: 人工智能；综述

• Artificial intelligence • Previous Articles Next Articles

Review of fine-grained image categorization

SHEN Zhijun^1,2, MU Lina², GAO Jing², SHI Yuanhang², LIU Zhiqiang²

1.School of Computer and Information Engineering， Fuyang Normal University， Fuyang Anhui 236037， China
2.College of Computer and Information Engineering， Inner Mongolia Agricultural University， Hohhot Inner Mongolia 010011， China

Received:2021-12-14 Revised:2022-02-12 Online:2022-08-02
Contact: SHEN Zhijun， born in 1976， Ph. D.， professor. His research interests include intelligent computing， data mining.
About author:SHEN Zhijun， born in 1976， Ph. D.， professor. His research interests include intelligent computing， data mining；MU Lina， born in 1996， M. S. candidate. Her research interests include computer vision， image recognition；GAO Jing， born in 1970， Ph. D.， professor. Her research interests include big data intelligence and knowledge discovery， analysis of animal and plant phenotype and omics big data， intelligent system for agriculture and animal husbandry；SHI Yuanhang， born in 1997， M. S. candidate. His research interests include artificial intelligence；LIU Zhiqiang， born in 1996， M. S. candidate. His research interests include artificial intelligence；
Supported by:
This work is partially supported by Scientific Research Project of Fuyang Normal University （2021KYQD0028）， Science and Technology Research Project of Inner Mongolia Autonomous Region （2021GG0090）， Doctoral Research Start?up Fund of Inner Mongolia Agricultural University （BJ2013B?1）， Open Project of Inner Mongolia Discipline Inspection and Supervision Big Data Laboratory （IMDBD2020015）.

细粒度图像分类综述

申志军^1,2, 穆丽娜², 高静², 史远航², 刘志强²

1.阜阳师范大学计算机与信息工程学院，安徽阜阳 236037
2.内蒙古农业大学计算机与信息工程学院，呼和浩特 010011

通讯作者: 申志军（1976—），男，河南信阳人，教授，博士，主要研究方向：智能计算、数据挖掘shensljx@sina.com
作者简介:申志军（1976—），男，河南信阳人，教授，博士，主要研究方向：智能计算、数据挖掘；穆丽娜（1996—），女，山西大同人，硕士研究生，主要研究方向：计算机视觉、图像识别；高静（1970—），女，内蒙古呼和浩特人，教授，博士生导师，博士，主要研究方向：大数据智能与知识发现、动植物表型与组学大数据分析、农牧业智能系统；史远航（1997—），男，河南新乡人，硕士研究生，主要研究方向：人工智能；刘志强（1996—），男，江西抚州人，硕士研究生，主要研究方向：人工智能；
基金资助:
阜阳师范大学科学研究项目（2021KYQD0028）；内蒙古自治区科技攻关项目（2021GG0090）；内蒙古农业大学博士科研启动基金资助项目（BJ2013B?1）；内蒙纪检监察大数据实验室开放课题（IMDBD2020015）。

Abstract

Abstract: The fine-grained image has characteristics of large intra-class variance and small inter-class variance， which makes Fine-Grained Image Categorization （FGIC） much more difficult than traditional image classification tasks. The application scenarios， task difficulties， algorithm development history and related common datasets of FGIC were described， and an overview of related algorithms was mainly presented. Classification methods based on local detection usually use operations of connection， summation and pooling， and the model training was complex and had many limitations in practical applications. Classification methods based on linear features simulated two neural pathways of human vision for recognition and localization respectively， and the classification effect is relatively better. Classification methods based on attention mechanism simulated the mechanism of human observation of external things， scanning the panorama first， and then locking the key attention area and forming the attention focus， and the classification effect was further improved. For the shortcomings of the current research， the next research directions of FGIC were proposed.

Key words: Fine-Grained Image Categorization (FGIC), deep learning, Convolutional Neural Network (CNN), attention mechanism, computer vision

摘要： 细粒度图像具有类内方差大、类间方差小的特点，致使细粒度图像分类（FGIC）的难度远高于传统的图像分类任务。介绍了FGIC的应用场景、任务难点、算法发展历程和相关的常用数据集，主要概述相关算法：基于局部检测的分类方法通常采用连接、求和及池化等操作，模型训练较为复杂，在实际应用中存在较多局限；基于线性特征的分类方法模仿人类视觉的两个神经通路分别进行识别和定位，分类效果相对较优；基于注意力机制的分类方法模拟人类观察外界事物的机制，先扫描全景，后锁定重点关注区域并形成注意力焦点，分类效果有进一步的提高。最后针对目前研究的不足，展望FGIC下一步的研究方向。

关键词: 细粒度图像分类, 深度学习, 卷积神经网络, 注意力机制, 计算机视觉

CLC Number:

SHEN Zhijun, MU Lina, GAO Jing, SHI Yuanhang, LIU Zhiqiang. Review of fine-grained image categorization[J]. Journal of Computer Applications, 2023, 43(1): 51-60.

申志军, 穆丽娜, 高静, 史远航, 刘志强. 细粒度图像分类综述[J]. 《计算机应用》唯一官方网站, 2023, 43(1): 51-60.

References

1 ZOU D N， ZHANG S H， MU T J， et al. A new dataset of dog breed images and a benchmark for fine?grained classification［J］. Computational Visual Media， 2020， 6（4）：477-487. 10.1007/s41095-020-0184-6
2 王美华，吴振鑫，周祖光. 基于注意力改进CBAM的农作物病虫害细粒度识别研究［J］. 农业机械学报， 2021， 52（4）：239-247. 10.6041/j.issn.1000-1298.2021.04.025 WANG M H， WU Z X， ZHOU Z G. Fine?grained identification research of crop pests and diseases based on improved CBAM via attention［J］. Transactions of the Chinese Society for Agricultural Machinery， 2021， 52（4）： 239- 247. 10.6041/j.issn.1000-1298.2021.04.025
3 陈前，刘骊，付晓东，等. 部件检测和语义网络的细粒度鞋类图像检索［J］. 中国图象图形学报， 2020， 25（8）：1578-1590. 10.11834/jig.190467 CHEN Q， LIU L， FU X D， et al. Fine?grained shoe image retrieval by part detection and semantic network［J］. Journal of Image and Graphics， 2020， 25（8）： 1578-1590. 10.11834/jig.190467
4 陈立潮，朝昕，曹建芳，等. 融合独立组件的ResNet在细粒度车型识别中的应用［J］. 计算机工程与应用， 2021， 57（11）：248-253. CHEN L C， CHAO X， CAO J F， et al. Application of ResNet with independent components in fine?grained vehicle recognition［J］. Computer Engineering and Applications， 2021， 57（11）：248-253.
5 BOSCH A， ZISSERMAN A， MUNOZ X. Scene classification using a hybrid generative/discriminative approach［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2008， 30（4）： 712-727. 10.1109/tpami.2007.70716
6 WU J X， REHG J M. CENTRIST： a visual descriptor for scene categorization［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2011， 33（8）： 1489-501. 10.1109/tpami.2010.224
7 薄康虎，李菲菲，陈虬. 基于改进CNN特征的场景识别［J］. 计算机系统应用， 2018， 27（12）：25-32. 10.15888/j.cnki.csa.006684 BO K H， LI F F， CHEN Q. Scene recognition algorithm using advanced CNN features［J］. Computer Systems and Applications， 2018， 27（12）：25-32. 10.15888/j.cnki.csa.006684
8 SEONG H， HYUN J， KIM E. FOSNet： an end?to?end trainable deep neural network for scene recognition［J］. IEEE Access， 2020， 8：82066-82077. 10.1109/access.2020.2989863
9 CHEN L， BO K H， LEE F F， et al. Advanced feature fusion algorithm based on multiple convolutional neural network for scene recognition［J］. Computer Modeling in Engineering and Sciences， 2020， 122（2）： 505-523. 10.32604/cmes.2020.08425
10 朱铭武，韩军，陆冬明，等. 自然场景中基于局部轮廓特征的对象识别方法［J］. 计算机工程与应用， 2016， 52（1）：162-167. 10.3778/j.issn.1002-8331.1409-0267 ZHU M W， HAN J， LU D M， et al. Object recognition method based on local contour feature in natural scene［J］. Computer Engineering and Applications， 2016， 52（1）：162-167. 10.3778/j.issn.1002-8331.1409-0267
11 GEHLER P， NOWOZIN S. On feature combination for multiclass object classification［C］// Proceedings of the IEEE 12th International Conference on Computer Vision. Piscataway： IEEE， 2009：221-228. 10.1109/iccv.2009.5459169
12 JARRETT K， KAVUKCUOGLU K， RANZATO M， et al. What is the best multi?stage architecture for object recognition？［C］// Proceedings of the IEEE 12th International Conference on Computer Vision. Piscataway： IEEE， 2009：2146-2153. 10.1109/iccv.2009.5459469
13 WRIGHT J， YANG A Y， GANESH A， et al. Robust face recognition via sparse representation［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2009， 31（2）： 210-227. 10.1109/tpami.2008.79
14 李晓莉，达飞鹏. 基于排除算法的快速三维人脸识别方法［J］. 自动化学报， 2010， 36（1）： 153-158. 10.3724/sp.j.1004.2010.00153 LI X L， DA F P. A rapid method for 3D face recognition based on rejection algorithm［J］. Acta Automatica Sinica， 2010， 36（1）： 153-158. 10.3724/sp.j.1004.2010.00153
15 DENG J， DONG W， SOCHER R， et al. ImageNet： a large?scale hierarchical image database［C］// Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2009：248-255. 10.1109/cvpr.2009.5206848
16 KRIZHEVSKY A， SUTSKEVER I， HINTON G E. ImageNet classification with deep convolutional neural networks［C］// Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1. Red Hook， NY： Curran Associates Inc.， 2012：1097-1105.
17 SIMONYAN K， ZISSERMAN A. Very deep convolutional networks for large?scale image recognition［EB/OL］. （2015-04-10）［2021-11-11］.https：//arxiv.org/pdf/1409.1556.pdf.
18 BO L F， REN X F， FOX D. Kernel descriptors for visual recognition［C］// Proceedings of the 23rd International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2010：244-252. 10.1109/iros.2011.6095119
19 LOWE D G. Distinctive image features from scale?invariant key points［J］. International Journal of Computer Vision， 2004， 60（2）： 91-110. 10.1023/b:visi.0000029664.99615.94
20 YAN K， SUKYHANKAR R. PCA-SIFT： a more distinctive representation for local image descriptors［C］// Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Washington， DC： IEEE Computer Society，2004： 506-513. 10.1109/cvpr.2004.1314997
21 LOWE D G. Object recognition from local scale?invariant features［C］// Proceedings of the 7th IEEE International Conference on Computer Vision， Volume 2. Piscataway： IEEE， 1999：1150-1157. 10.1109/iccv.1999.790410
22 BAY H， TUYTELAARS T， GOOL L van. SURF： speeded up robust features［C］// Proceedings of the 2006 European Conference on Computer Vision， LNCS 3951. Berlin： Springer， 2006：404-417.
23 DALAL N， TRIGGS B. Histograms of oriented gradients for human detection［C］// Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition， Volume 1. Piscataway： IEEE， 2005：886-893. 10.1109/cvpr.2005.177
24 OJALA T， PIETIKAINEN M， M?ENP?? T. Multiresolution gray?scale and rotation invariant texture classification with local binary patterns［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2002， 24（7）： 971-987. 10.1109/tpami.2002.1017623
25 BERG T， BELHUMEUR P N. POOF： part?based one?vs.?one features for fine?grained categorization， face verification， and attribute estimation［C］// Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2013：955-962. 10.1109/cvpr.2013.128
26 PERRONNIN F， SáNCHEZ J， MENSINK T. Improving the Fisher kernel for large?scale image classification［C］// Proceedings of the 2010 European Conference on Computer Vision， LNCS 6314. Berlin： Springer， 2010： 143-156.
27 BRANSON S， HORN G van， WAH C， et al. The ignorant led by the blind： a hybrid human?machine vision system for fine?grained categorization［J］. International Journal of Computer Vision， 2014， 108（1/2）： 3-29.
28 CHAI Y N， LEMPITSKY V， ZISSERMAN A. Symbiotic segmentation and part localization for fine?grained categorization［C］// Proceedings of the 2013 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2013：321-328. 10.1109/iccv.2013.47
29 GAVVES E， FERNANDO B， SNOEK C G M， et al. Fine?grained categorization by alignments［C］// Proceedings of the 2013 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2013：1713-1720. 10.1109/iccv.2013.215
30 BRANSON S， WAH C， SCHROFF F， et al. Visual recognition with humans in the loop［C］// Proceedings of the 2010 European Conference on Computer Vision， LNCS 6314. Berlin： Springer， 2010： 438-451.
31 WAH C， BRANSON S， PERONA P， et al. Multiclass recognition and part localization with humans in the loop［C］// Proceedings of the 2011 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2011： 2524-2531. 10.1109/iccv.2011.6126539
32 WANG D Q， SHEN Z Q， SHAO J， et al. Multiple granularity descriptors for fine?grained categorization［C］// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2015：2399-2406. 10.1109/iccv.2015.276
33 WANG Y M， CHOI J， MORARIU V I， et al. Mining discriminative triplets of patches for fine?grained classification［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016：1163-1172. 10.1109/cvpr.2016.131
34 LIN T Y， RoyCHOWDHURY A， MAJI S. Bilinear CNN models for fine?grained visual recognition［C］// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2015：1449-1457. 10.1109/iccv.2015.170
35 苏志明，王烈，蓝峥杰. 基于多尺度分层双线性池化网络的细粒度表情识别模型［J］. 计算机工程， 2021， 47（12）：299-307， 315. 10.19678/j.issn.1000-3428.0060133 SU Z M， WANG L， LAN Z J. Fine?grained expression recognition model based on multi?scale hierarchical bilinear pooling network［J］. Computer Engineering， 2021， 47（12）：299-307， 315. 10.19678/j.issn.1000-3428.0060133
36 ZHANG Y， WEI X S， WU J X， et al. Weakly supervised fine?grained categorization with part?based image representation［J］. IEEE Transactions on Image Processing， 2016， 25（4）： 1713-1725. 10.1109/tip.2016.2531289
37 XIAO T J， XU Y C， YANG K Y， et al. The application of two?level attention models in deep convolutional neural network for fine?grained image classification［C］// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2015：842-850. 10.1109/cvpr.2015.7298685
38 LIU X， XIA T， WANG J， et al. Fully convolutional attention networks for fine?grained recognition［EB/OL］. （2017-03-21）［2021-11-11］.https：//arxiv.org/pdf/1603.06765.pdf.
39 FU J L， ZHENG H L， MEI T. Look closer to see better： recurrent attention convolutional neural network for fine?grained image recognition［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017：4476-4484. 10.1109/cvpr.2017.476
40 王林，李聪会. 基于多级注意力跳跃连接网络的行人属性识别［J］. 计算机工程， 2021， 47（2）：314-320. 10.19678/j.issn.1000-3428.0057107 WANG L， LI C H. Pedestrian attribute recognition based on multi?level attention skip connection network［J］. Computer Engineering， 2021， 47（2）：314-320. 10.19678/j.issn.1000-3428.0057107
41 李宽宽，刘立波. 双线性聚合残差注意力的细粒度图像分类模型［J］. 计算机科学与探索， 2022， 16（4）：938-949. 10.3778/j.issn.1673-9418.2010031 LI K K， LIU L B. Fine?grained image classification model based on bilinear aggregate residual attention［J］. Journal of Frontiers of Computer Science and Technology， 2022， 16（4）：938-949. 10.3778/j.issn.1673-9418.2010031
42 陆鑫伟，余鹏飞，李海燕，等. 基于注意力自身线性融合的弱监督细粒度图像分类算法［J］. 计算机应用， 2021， 41（5）：1319-1325. 10.11772/j.issn.1001-9081.2020071105 LU X W， YU P F， LI H Y， et al. Weakly supervised fine?grained image classification method based on attention?attention bilinear pooling［J］. Journal of Computer Applications， 2021， 41（5）：1319-1325. 10.11772/j.issn.1001-9081.2020071105
43 WAH C， BRANSON S， WELINDER P， et al. The Caltech?UCSD Birds200?2011 dataset： CNS?TR?2011?001［R］. Pasadena， CA： California Institute of Technology， 2011.
44 KHOSLA A， JAYADEVAPRAKASH N， YAO B P， et al. Novel dataset for fine?grained image categorization［C/OL］// Proceedings of the 1st Workshop on Fine?Grained Visual Categorization at CVPR 2011. ［2021-11-11］.https：//people.csail.mit.edu/khosla/papers/fgvc2011.pdf.
45 KRAUSE J， STARK M， DENG J， et al. 3D object representations for fine-grained categorization［C］// Proceedings of the 2013 IEEE International Conference on Computer Vision Workshops. Piscataway： IEEE， 2013：554-561. 10.1109/iccvw.2013.77
46 MAJI S， RAHTU E， KANNALA J， et al. Fine?grained visual classification of aircraft［EB/OL］. （2013-06-21）［2021-10-08］.https：//arxiv.org/pdf/1306.5151.pdf.
47 NILSBACK M E， ZISSERMAN A. Automated flower classification over a large number of classes［C］// Proceedings of the 6th Indian Conference on Computer Vision， Graphics and Image Processing. Piscataway： IEEE， 2008：722-729. 10.1109/icvgip.2008.47
48 FISHER R B， CHEN?BURGER Y H， GIORDANO D， et al. Fish4Knowledge： Collecting and Analyzing Massive Coral Reef Fish Video Data， ISRL 104［M］. Cham： Springer， 2016. 10.1007/978-3-319-30208-9
49 ZHUANG P Q， WANG Y L， QIAO Y. WildFish： a large benchmark for fish recognition in the wild［C］// Proceedings of 26th ACM Multimedia Conference. New York： ACM， 2018：1301-1309. 10.1145/3240508.3240616
50 DONAHUE J， JIA Y Q， VINYALS O， et al. DeCAF： a deep convolutional activation feature for generic visual recognition［C］// Proceedings of the 31st International Conference on Machine Learning. New York： JMLR.org， 2014：647-655.
51 FARRELL R， OZA O， ZHANG N， et al. Birdlets： subordinate categorization using volumetric primitives and pose?normalized appearance［C］// Proceedings of the 2011 International Conference on Computer Vision. Piscataway： IEEE， 2011：161-168. 10.1109/iccv.2011.6126238
52 BOURDEV L， MALIK J. Poselets： body part detectors trained using 3D human pose annotations［C］// Proceedings of the 2009 IEEE 12th International Conference on Computer Vision. Piscataway： IEEE， 2009：1365-1372. 10.1109/iccv.2009.5459303
53 BOURDEV L， MAJI S， MALIK J. Describing people： poselet?based approach to attribute classification［C］// Proceedings of the 2011 International Conference on Computer Vision. Piscataway： IEEE， 2011：1543-1550. 10.1109/iccv.2011.6126413
54 FELZENSZWALB P F， GIRSHICK R B， McALLESTER D， et al. Object detection with discriminatively trained part based models［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2010， 32（9）：1627-1645. 10.1109/tpami.2009.167
55 PARKHI O M， VEDALDI A， JAWAHAR C V， et al. The truth about cats and dogs［C］// Proceedings of the 2011 International Conference on Computer Vision. Piscataway： IEEE， 2011：1427-1434. 10.1109/iccv.2011.6126398
56 ZHANG N， FARRELL R， IANDOLA F， et al. Deformable part descriptors for fine?grained recognition and attribute prediction［C］// Proceedings of the 2013 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2013：729-736. 10.1109/iccv.2013.96
57 ZHANG N， PALURI M， RANZATO M， et al. PANDA： pose aligned networks for deep attribute modeling［C］// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2014：1637-1644. 10.1109/cvpr.2014.212
58 GIRSHICK R， DONAHUE J， DARRELL T， et al. Rich feature hierarchies for accurate object detection and semantic segmentation［C］// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2014：580-587. 10.1109/cvpr.2014.81
59 ZHANG N， DONAHUE J， GIRSHICK R， et al. Part?based RCNNs for fine?grained category detection［C］// Proceedings of the 2014 European Conference on Computer Vision， LNCS 8689. Cham： Springer， 2014：834-849.
60 BRANSON S， HORN G van， BELONGIE S， et al. Bird species categorization using pose normalized deep convolutional nets［C］// Proceedings of the 2014 British Machine Vision Conference. Durham： BMVA Press， 2014：No.71. 10.5244/c.28.87
61 LIN D， SHEN X Y， LU C W， et al. Deep LAC： deep localization， alignment and classification for fine?grained recognition［C］// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2015：1666-1674. 10.1109/cvpr.2015.7298775
62 SHELHAMER E， LONG J， DARRELL T. Fully convolutional networks for semantic segmentation［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2017， 39（4）：640-651. 10.1109/tpami.2016.2572683
63 WEI X S， XIE C W， WU J X， et al. Mask?CNN： localizing parts and selecting descriptors for fine?grained image recognition［J］. Pattern Recognition， 2018， 76：704-714. 10.1016/j.patcog.2017.10.002
64 黄伟锋，张甜，常东良，等. 基于多视角融合的细粒度图像分类方法［J］. 信号处理， 2020， 36（9）：1607-1614. 10.16798/j.issn.1003-0530.2020.09.027 HUANG W F， ZHANG T， CHANG D L， et al. Multi?view comprehensive based fine?grained image classification［J］. Journal of Signal Processing， 2020， 36（9）：1607-1614. 10.16798/j.issn.1003-0530.2020.09.027
65 GAO Y， BEIJBOM O， ZHANG N， et al. Compact bilinear pooling［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016：317-326. 10.1109/cvpr.2016.41
66 KAR P， KARNICK H. Random feature maps for dot product kernels［C］// Proceedings of the 15th International Conference on Artificial Intelligence and Statistics. New York： JMLR.org， 2012：583-591.
67 PHAM N， PAGH R. Fast and scalable polynomial kernels via explicit feature maps［C］// Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York： ACM， 2013：239-247. 10.1145/2487575.2487591
68 KONG S， FOWLKES C. Low?rank bilinear pooling for fine?grained classification［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017：7025-7034. 10.1109/cvpr.2017.743
69 LI Y H， WANG N Y， LIU J Y， et al. Factorized bilinear models for image recognition［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017：2098-2106. 10.1109/iccv.2017.229
70 葛疏雨，高子淋，张冰冰，等. 基于核化双线性卷积网络的细粒度图像分类［J］. 电子学报， 2019， 47（10）：2134-2141. 10.3969/j.issn.0372-2112.2019.10.015 GE S Y， GAO Z L， ZHANG B B， et al. Kernelized bilinear CNN models for fine?grained visual recognition［J］. Acta Electronica Sinica， 2019， 47（10）：2134-2141. 10.3969/j.issn.0372-2112.2019.10.015
71 LIN T Y， MAJI S. Improved bilinear pooling with CNNs［C］// Proceedings of the 2017 British Machine Vision Conference. Durham： BMVA Press， 2017： No.117. 10.5244/c.31.117
72 CUI Y， ZHOU F， WANG J， et al. Kernel pooling for convolutional neural networks［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017：3049-3058. 10.1109/cvpr.2017.325
73 MOGHIMI M， SABERIAN M， YANG J， et al. Boosted convolutional neural networks［C］// Proceedings of the 2016 British Machine Vision Conference. Durham： BMVA Press， 2016： No.24. 10.5244/c.30.24
74 闫子旭，侯志强，熊磊，等. YOLOv3和双线性特征融合的细粒度图像分类［J］. 中国图象图形学报， 2021， 26（4）：847-856. 10.11834/jig.200031 YAN Z X， HOU Z Q， XIONG L， et al. Fine?grained classification based on bilinear feature fusion and YOLOv3［J］. Journal of Image and Graphics， 2021， 26（4）：847-856. 10.11834/jig.200031
75 YU C J， ZHAO X Y， ZHENG Q， et al. Hierarchical bilinear pooling for fine?grained visual recognition［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11220. Cham： Springer， 2018：595-610.
76 ITTI L， KOCH C， NIEBUR E. A model of saliency?based visual attention for rapid scene analysis［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 1998， 20（11）：1254-1259. 10.1109/34.730558
77 MNIH V， HEESS N， GRAVES， et al. Recurrent models of visual attention［C］// Proceedings of the 27th International Conference on Neural Information Processing Systems， Volume 2. Cambridge： MIT Press， 2014：2204-2212.
78 ZHENG H L， FU J L， MEI T， et al. Learning multi?attention convolutional neural network for fine?grained image recognition［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017：5219-5227. 10.1109/iccv.2017.557
79 CHANG D L， DING Y F， XIE J Y， et al. The devil is in the channels： mutual?channel loss for fine?grained image classification［J］. IEEE Transactions on Image Processing， 2020， 29：4683-4695. 10.1109/tip.2020.2973812
80 ZHUANG P Q， WANG Y L， QIAO Y. Learning attentive pairwise interaction for fine?grained classification［C］// Proceedings of the 34th AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2020：13130-13137. 10.1609/aaai.v34i07.7016
81 ZHANG T， CHANG D L， MA Z Y， et al. Progressive co?attention network for fine?grained visual classification［C］// Proceedings of the 2021 International Conference on Visual Communications and Image Processing. Piscataway： IEEE， 2021：1-5. 10.1109/vcip53242.2021.9675376
82 JI R Y， WEN L Y， ZHANG L B， et al. Attention convolutional binary neural tree for fine?grained visual categorization［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020：10465-10474. 10.1109/cvpr42600.2020.01048
83 SUN M， YUAN Y C， ZHOU F， et al. Multi?attention multi?class constraint for fine?grained image recognition［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11220. Cham： Springer， 2018： 834-850.
84 VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2017：6000-6010.
85 OTT M， EDUNOV S， GRANGIER D， et al. Scaling neural machine translation［C］// Proceedings of the 3rd Conference on Machine Translation： Research Papers. Stroudsburg， PA： ACL， 2018：1-9. 10.18653/v1/w18-6301
86 DOSOVITSKIY A， BEYER L， KOLESNIKOV A， et al. An image is worth 16x16 words： Transformers for image recognition at scale［EB/OL］. （2021-06-03）［2021-06-11］.https：//arxiv.org/pdf/2010.11929.pdf.
87 CARION N， MASSA F， SYNNAEVE G， et al. End?to?end object detection with transformers［C］// Proceedings of the 2020 European Conference on Computer Vision， LNCS 12346. Cham： Springer， 2020：213-229.
88 ZHU X Z， SU W J， LU L W， et al. Deformable DETR： deformable Transformers for end?to?end object detection［EB/OL］. （2021-03-18）［2021-11-11］.https：//arxiv.org/pdf/2010.04159.pdf.

[1]	Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974.
[2]	Xiyuan WANG, Zhancheng ZHANG, Shaokang XU, Baocheng ZHANG, Xiaoqing LUO, Fuyuan HU. Unsupervised cross-domain transfer network for 3D/2D registration in surgical navigation [J]. Journal of Computer Applications, 2024, 44(9): 2911-2918.
[3]	Liting LI, Bei HUA, Ruozhou HE, Kuang XU. Multivariate time series prediction model based on decoupled attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2732-2738.
[4]	Shunyong LI, Shiyi LI, Rui XU, Xingwang ZHAO. Incomplete multi-view clustering algorithm based on self-attention fusion [J]. Journal of Computer Applications, 2024, 44(9): 2696-2703.
[5]	Yexin PAN, Zhe YANG. Optimization model for small object detection based on multi-level feature bidirectional fusion [J]. Journal of Computer Applications, 2024, 44(9): 2871-2877.
[6]	Yun LI, Fuyou WANG, Peiguang JING, Su WANG, Ao XIAO. Uncertainty-based frame associated short video event detection method [J]. Journal of Computer Applications, 2024, 44(9): 2903-2910.
[7]	Zhiqiang ZHAO, Peihong MA, Xinhong HEI. Crowd counting method based on dual attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2886-2892.
[8]	Yunchuan HUANG, Yongquan JIANG, Juntao HUANG, Yan YANG. Molecular toxicity prediction based on meta graph isomorphism network [J]. Journal of Computer Applications, 2024, 44(9): 2964-2969.
[9]	Shuai FU, Xiaoying GUO, Ruyi BAI, Tao YAN, Bin CHEN. Age estimation method combining improved CloFormer model and ordinal regression [J]. Journal of Computer Applications, 2024, 44(8): 2372-2380.
[10]	Kaipeng XUE, Tao XU, Chunjie LIAO. Multimodal sentiment analysis network with self-supervision and multi-layer cross attention [J]. Journal of Computer Applications, 2024, 44(8): 2387-2392.
[11]	Pengqi GAO, Heming HUANG, Yonghong FAN. Fusion of coordinate and multi-head attention mechanisms for interactive speech emotion recognition [J]. Journal of Computer Applications, 2024, 44(8): 2400-2406.
[12]	Yuhan LIU, Genlin JI, Hongping ZHANG. Video pedestrian anomaly detection method based on skeleton graph and mixed attention [J]. Journal of Computer Applications, 2024, 44(8): 2551-2557.
[13]	Zhonghua LI, Yunqi BAI, Xuejin WANG, Leilei HUANG, Chujun LIN, Shiyu LIAO. Low illumination face detection based on image enhancement [J]. Journal of Computer Applications, 2024, 44(8): 2588-2594.
[14]	Shangbin MO, Wenjun WANG, Ling DONG, Shengxiang GAO, Zhengtao YU. Single-channel speech enhancement based on multi-channel information aggregation and collaborative decoding [J]. Journal of Computer Applications, 2024, 44(8): 2611-2617.
[15]	Yanjie GU, Yingjun ZHANG, Xiaoqian LIU, Wei ZHOU, Wei SUN. Traffic flow forecasting via spatial-temporal multi-graph fusion [J]. Journal of Computer Applications, 2024, 44(8): 2618-2625.

Review of fine-grained image categorization

细粒度图像分类综述

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics