细粒度图像分类综述

doi:10.11772/j.issn.1001-9081.2021122090

《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (1): 51-60.DOI: 10.11772/j.issn.1001-9081.2021122090

所属专题：人工智能；综述

细粒度图像分类综述

申志军^1,2, 穆丽娜², 高静², 史远航², 刘志强²

1.阜阳师范大学计算机与信息工程学院，安徽阜阳 236037
2.内蒙古农业大学计算机与信息工程学院，呼和浩特 010011

收稿日期:2021-12-14 修回日期:2022-02-12 发布日期:2022-08-02
通讯作者: 申志军（1976—），男，河南信阳人，教授，博士，主要研究方向：智能计算、数据挖掘shensljx@sina.com
作者简介:申志军（1976—），男，河南信阳人，教授，博士，主要研究方向：智能计算、数据挖掘；穆丽娜（1996—），女，山西大同人，硕士研究生，主要研究方向：计算机视觉、图像识别；高静（1970—），女，内蒙古呼和浩特人，教授，博士生导师，博士，主要研究方向：大数据智能与知识发现、动植物表型与组学大数据分析、农牧业智能系统；史远航（1997—），男，河南新乡人，硕士研究生，主要研究方向：人工智能；刘志强（1996—），男，江西抚州人，硕士研究生，主要研究方向：人工智能；
基金资助:
阜阳师范大学科学研究项目（2021KYQD0028）；内蒙古自治区科技攻关项目（2021GG0090）；内蒙古农业大学博士科研启动基金资助项目（BJ2013B?1）；内蒙纪检监察大数据实验室开放课题（IMDBD2020015）。

Review of fine-grained image categorization

SHEN Zhijun^1,2, MU Lina², GAO Jing², SHI Yuanhang², LIU Zhiqiang²

1.School of Computer and Information Engineering， Fuyang Normal University， Fuyang Anhui 236037， China
2.College of Computer and Information Engineering， Inner Mongolia Agricultural University， Hohhot Inner Mongolia 010011， China

Received:2021-12-14 Revised:2022-02-12 Online:2022-08-02
Contact: SHEN Zhijun， born in 1976， Ph. D.， professor. His research interests include intelligent computing， data mining.
About author:SHEN Zhijun， born in 1976， Ph. D.， professor. His research interests include intelligent computing， data mining；MU Lina， born in 1996， M. S. candidate. Her research interests include computer vision， image recognition；GAO Jing， born in 1970， Ph. D.， professor. Her research interests include big data intelligence and knowledge discovery， analysis of animal and plant phenotype and omics big data， intelligent system for agriculture and animal husbandry；SHI Yuanhang， born in 1997， M. S. candidate. His research interests include artificial intelligence；LIU Zhiqiang， born in 1996， M. S. candidate. His research interests include artificial intelligence；
Supported by:
This work is partially supported by Scientific Research Project of Fuyang Normal University （2021KYQD0028）， Science and Technology Research Project of Inner Mongolia Autonomous Region （2021GG0090）， Doctoral Research Start?up Fund of Inner Mongolia Agricultural University （BJ2013B?1）， Open Project of Inner Mongolia Discipline Inspection and Supervision Big Data Laboratory （IMDBD2020015）.

摘要/Abstract

摘要： 细粒度图像具有类内方差大、类间方差小的特点，致使细粒度图像分类（FGIC）的难度远高于传统的图像分类任务。介绍了FGIC的应用场景、任务难点、算法发展历程和相关的常用数据集，主要概述相关算法：基于局部检测的分类方法通常采用连接、求和及池化等操作，模型训练较为复杂，在实际应用中存在较多局限；基于线性特征的分类方法模仿人类视觉的两个神经通路分别进行识别和定位，分类效果相对较优；基于注意力机制的分类方法模拟人类观察外界事物的机制，先扫描全景，后锁定重点关注区域并形成注意力焦点，分类效果有进一步的提高。最后针对目前研究的不足，展望FGIC下一步的研究方向。

关键词: 细粒度图像分类, 深度学习, 卷积神经网络, 注意力机制, 计算机视觉

Abstract: The fine-grained image has characteristics of large intra-class variance and small inter-class variance， which makes Fine-Grained Image Categorization （FGIC） much more difficult than traditional image classification tasks. The application scenarios， task difficulties， algorithm development history and related common datasets of FGIC were described， and an overview of related algorithms was mainly presented. Classification methods based on local detection usually use operations of connection， summation and pooling， and the model training was complex and had many limitations in practical applications. Classification methods based on linear features simulated two neural pathways of human vision for recognition and localization respectively， and the classification effect is relatively better. Classification methods based on attention mechanism simulated the mechanism of human observation of external things， scanning the panorama first， and then locking the key attention area and forming the attention focus， and the classification effect was further improved. For the shortcomings of the current research， the next research directions of FGIC were proposed.

Key words: Fine-Grained Image Categorization (FGIC), deep learning, Convolutional Neural Network (CNN), attention mechanism, computer vision

中图分类号:

申志军, 穆丽娜, 高静, 史远航, 刘志强. 细粒度图像分类综述[J]. 计算机应用, 2023, 43(1): 51-60.

SHEN Zhijun, MU Lina, GAO Jing, SHI Yuanhang, LIU Zhiqiang. Review of fine-grained image categorization[J]. Journal of Computer Applications, 2023, 43(1): 51-60.

参考文献

1 ZOU D N， ZHANG S H， MU T J， et al. A new dataset of dog breed images and a benchmark for fine?grained classification［J］. Computational Visual Media， 2020， 6（4）：477-487. 10.1007/s41095-020-0184-6
2 王美华，吴振鑫，周祖光. 基于注意力改进CBAM的农作物病虫害细粒度识别研究［J］. 农业机械学报， 2021， 52（4）：239-247. 10.6041/j.issn.1000-1298.2021.04.025 WANG M H， WU Z X， ZHOU Z G. Fine?grained identification research of crop pests and diseases based on improved CBAM via attention［J］. Transactions of the Chinese Society for Agricultural Machinery， 2021， 52（4）： 239- 247. 10.6041/j.issn.1000-1298.2021.04.025
3 陈前，刘骊，付晓东，等. 部件检测和语义网络的细粒度鞋类图像检索［J］. 中国图象图形学报， 2020， 25（8）：1578-1590. 10.11834/jig.190467 CHEN Q， LIU L， FU X D， et al. Fine?grained shoe image retrieval by part detection and semantic network［J］. Journal of Image and Graphics， 2020， 25（8）： 1578-1590. 10.11834/jig.190467
4 陈立潮，朝昕，曹建芳，等. 融合独立组件的ResNet在细粒度车型识别中的应用［J］. 计算机工程与应用， 2021， 57（11）：248-253. CHEN L C， CHAO X， CAO J F， et al. Application of ResNet with independent components in fine?grained vehicle recognition［J］. Computer Engineering and Applications， 2021， 57（11）：248-253.
5 BOSCH A， ZISSERMAN A， MUNOZ X. Scene classification using a hybrid generative/discriminative approach［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2008， 30（4）： 712-727. 10.1109/tpami.2007.70716
6 WU J X， REHG J M. CENTRIST： a visual descriptor for scene categorization［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2011， 33（8）： 1489-501. 10.1109/tpami.2010.224
7 薄康虎，李菲菲，陈虬. 基于改进CNN特征的场景识别［J］. 计算机系统应用， 2018， 27（12）：25-32. 10.15888/j.cnki.csa.006684 BO K H， LI F F， CHEN Q. Scene recognition algorithm using advanced CNN features［J］. Computer Systems and Applications， 2018， 27（12）：25-32. 10.15888/j.cnki.csa.006684
8 SEONG H， HYUN J， KIM E. FOSNet： an end?to?end trainable deep neural network for scene recognition［J］. IEEE Access， 2020， 8：82066-82077. 10.1109/access.2020.2989863
9 CHEN L， BO K H， LEE F F， et al. Advanced feature fusion algorithm based on multiple convolutional neural network for scene recognition［J］. Computer Modeling in Engineering and Sciences， 2020， 122（2）： 505-523. 10.32604/cmes.2020.08425
10 朱铭武，韩军，陆冬明，等. 自然场景中基于局部轮廓特征的对象识别方法［J］. 计算机工程与应用， 2016， 52（1）：162-167. 10.3778/j.issn.1002-8331.1409-0267 ZHU M W， HAN J， LU D M， et al. Object recognition method based on local contour feature in natural scene［J］. Computer Engineering and Applications， 2016， 52（1）：162-167. 10.3778/j.issn.1002-8331.1409-0267
11 GEHLER P， NOWOZIN S. On feature combination for multiclass object classification［C］// Proceedings of the IEEE 12th International Conference on Computer Vision. Piscataway： IEEE， 2009：221-228. 10.1109/iccv.2009.5459169
12 JARRETT K， KAVUKCUOGLU K， RANZATO M， et al. What is the best multi?stage architecture for object recognition？［C］// Proceedings of the IEEE 12th International Conference on Computer Vision. Piscataway： IEEE， 2009：2146-2153. 10.1109/iccv.2009.5459469
13 WRIGHT J， YANG A Y， GANESH A， et al. Robust face recognition via sparse representation［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2009， 31（2）： 210-227. 10.1109/tpami.2008.79
14 李晓莉，达飞鹏. 基于排除算法的快速三维人脸识别方法［J］. 自动化学报， 2010， 36（1）： 153-158. 10.3724/sp.j.1004.2010.00153 LI X L， DA F P. A rapid method for 3D face recognition based on rejection algorithm［J］. Acta Automatica Sinica， 2010， 36（1）： 153-158. 10.3724/sp.j.1004.2010.00153
15 DENG J， DONG W， SOCHER R， et al. ImageNet： a large?scale hierarchical image database［C］// Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2009：248-255. 10.1109/cvpr.2009.5206848
16 KRIZHEVSKY A， SUTSKEVER I， HINTON G E. ImageNet classification with deep convolutional neural networks［C］// Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1. Red Hook， NY： Curran Associates Inc.， 2012：1097-1105.
17 SIMONYAN K， ZISSERMAN A. Very deep convolutional networks for large?scale image recognition［EB/OL］. （2015-04-10）［2021-11-11］.https：//arxiv.org/pdf/1409.1556.pdf.
18 BO L F， REN X F， FOX D. Kernel descriptors for visual recognition［C］// Proceedings of the 23rd International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2010：244-252. 10.1109/iros.2011.6095119
19 LOWE D G. Distinctive image features from scale?invariant key points［J］. International Journal of Computer Vision， 2004， 60（2）： 91-110. 10.1023/b:visi.0000029664.99615.94
20 YAN K， SUKYHANKAR R. PCA-SIFT： a more distinctive representation for local image descriptors［C］// Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Washington， DC： IEEE Computer Society，2004： 506-513. 10.1109/cvpr.2004.1314997
21 LOWE D G. Object recognition from local scale?invariant features［C］// Proceedings of the 7th IEEE International Conference on Computer Vision， Volume 2. Piscataway： IEEE， 1999：1150-1157. 10.1109/iccv.1999.790410
22 BAY H， TUYTELAARS T， GOOL L van. SURF： speeded up robust features［C］// Proceedings of the 2006 European Conference on Computer Vision， LNCS 3951. Berlin： Springer， 2006：404-417.
23 DALAL N， TRIGGS B. Histograms of oriented gradients for human detection［C］// Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition， Volume 1. Piscataway： IEEE， 2005：886-893. 10.1109/cvpr.2005.177
24 OJALA T， PIETIKAINEN M， M?ENP?? T. Multiresolution gray?scale and rotation invariant texture classification with local binary patterns［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2002， 24（7）： 971-987. 10.1109/tpami.2002.1017623
25 BERG T， BELHUMEUR P N. POOF： part?based one?vs.?one features for fine?grained categorization， face verification， and attribute estimation［C］// Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2013：955-962. 10.1109/cvpr.2013.128
26 PERRONNIN F， SáNCHEZ J， MENSINK T. Improving the Fisher kernel for large?scale image classification［C］// Proceedings of the 2010 European Conference on Computer Vision， LNCS 6314. Berlin： Springer， 2010： 143-156.
27 BRANSON S， HORN G van， WAH C， et al. The ignorant led by the blind： a hybrid human?machine vision system for fine?grained categorization［J］. International Journal of Computer Vision， 2014， 108（1/2）： 3-29.
28 CHAI Y N， LEMPITSKY V， ZISSERMAN A. Symbiotic segmentation and part localization for fine?grained categorization［C］// Proceedings of the 2013 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2013：321-328. 10.1109/iccv.2013.47
29 GAVVES E， FERNANDO B， SNOEK C G M， et al. Fine?grained categorization by alignments［C］// Proceedings of the 2013 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2013：1713-1720. 10.1109/iccv.2013.215
30 BRANSON S， WAH C， SCHROFF F， et al. Visual recognition with humans in the loop［C］// Proceedings of the 2010 European Conference on Computer Vision， LNCS 6314. Berlin： Springer， 2010： 438-451.
31 WAH C， BRANSON S， PERONA P， et al. Multiclass recognition and part localization with humans in the loop［C］// Proceedings of the 2011 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2011： 2524-2531. 10.1109/iccv.2011.6126539
32 WANG D Q， SHEN Z Q， SHAO J， et al. Multiple granularity descriptors for fine?grained categorization［C］// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2015：2399-2406. 10.1109/iccv.2015.276
33 WANG Y M， CHOI J， MORARIU V I， et al. Mining discriminative triplets of patches for fine?grained classification［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016：1163-1172. 10.1109/cvpr.2016.131
34 LIN T Y， RoyCHOWDHURY A， MAJI S. Bilinear CNN models for fine?grained visual recognition［C］// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2015：1449-1457. 10.1109/iccv.2015.170
35 苏志明，王烈，蓝峥杰. 基于多尺度分层双线性池化网络的细粒度表情识别模型［J］. 计算机工程， 2021， 47（12）：299-307， 315. 10.19678/j.issn.1000-3428.0060133 SU Z M， WANG L， LAN Z J. Fine?grained expression recognition model based on multi?scale hierarchical bilinear pooling network［J］. Computer Engineering， 2021， 47（12）：299-307， 315. 10.19678/j.issn.1000-3428.0060133
36 ZHANG Y， WEI X S， WU J X， et al. Weakly supervised fine?grained categorization with part?based image representation［J］. IEEE Transactions on Image Processing， 2016， 25（4）： 1713-1725. 10.1109/tip.2016.2531289
37 XIAO T J， XU Y C， YANG K Y， et al. The application of two?level attention models in deep convolutional neural network for fine?grained image classification［C］// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2015：842-850. 10.1109/cvpr.2015.7298685
38 LIU X， XIA T， WANG J， et al. Fully convolutional attention networks for fine?grained recognition［EB/OL］. （2017-03-21）［2021-11-11］.https：//arxiv.org/pdf/1603.06765.pdf.
39 FU J L， ZHENG H L， MEI T. Look closer to see better： recurrent attention convolutional neural network for fine?grained image recognition［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017：4476-4484. 10.1109/cvpr.2017.476
40 王林，李聪会. 基于多级注意力跳跃连接网络的行人属性识别［J］. 计算机工程， 2021， 47（2）：314-320. 10.19678/j.issn.1000-3428.0057107 WANG L， LI C H. Pedestrian attribute recognition based on multi?level attention skip connection network［J］. Computer Engineering， 2021， 47（2）：314-320. 10.19678/j.issn.1000-3428.0057107
41 李宽宽，刘立波. 双线性聚合残差注意力的细粒度图像分类模型［J］. 计算机科学与探索， 2022， 16（4）：938-949. 10.3778/j.issn.1673-9418.2010031 LI K K， LIU L B. Fine?grained image classification model based on bilinear aggregate residual attention［J］. Journal of Frontiers of Computer Science and Technology， 2022， 16（4）：938-949. 10.3778/j.issn.1673-9418.2010031
42 陆鑫伟，余鹏飞，李海燕，等. 基于注意力自身线性融合的弱监督细粒度图像分类算法［J］. 计算机应用， 2021， 41（5）：1319-1325. 10.11772/j.issn.1001-9081.2020071105 LU X W， YU P F， LI H Y， et al. Weakly supervised fine?grained image classification method based on attention?attention bilinear pooling［J］. Journal of Computer Applications， 2021， 41（5）：1319-1325. 10.11772/j.issn.1001-9081.2020071105
43 WAH C， BRANSON S， WELINDER P， et al. The Caltech?UCSD Birds200?2011 dataset： CNS?TR?2011?001［R］. Pasadena， CA： California Institute of Technology， 2011.
44 KHOSLA A， JAYADEVAPRAKASH N， YAO B P， et al. Novel dataset for fine?grained image categorization［C/OL］// Proceedings of the 1st Workshop on Fine?Grained Visual Categorization at CVPR 2011. ［2021-11-11］.https：//people.csail.mit.edu/khosla/papers/fgvc2011.pdf.
45 KRAUSE J， STARK M， DENG J， et al. 3D object representations for fine-grained categorization［C］// Proceedings of the 2013 IEEE International Conference on Computer Vision Workshops. Piscataway： IEEE， 2013：554-561. 10.1109/iccvw.2013.77
46 MAJI S， RAHTU E， KANNALA J， et al. Fine?grained visual classification of aircraft［EB/OL］. （2013-06-21）［2021-10-08］.https：//arxiv.org/pdf/1306.5151.pdf.
47 NILSBACK M E， ZISSERMAN A. Automated flower classification over a large number of classes［C］// Proceedings of the 6th Indian Conference on Computer Vision， Graphics and Image Processing. Piscataway： IEEE， 2008：722-729. 10.1109/icvgip.2008.47
48 FISHER R B， CHEN?BURGER Y H， GIORDANO D， et al. Fish4Knowledge： Collecting and Analyzing Massive Coral Reef Fish Video Data， ISRL 104［M］. Cham： Springer， 2016. 10.1007/978-3-319-30208-9
49 ZHUANG P Q， WANG Y L， QIAO Y. WildFish： a large benchmark for fish recognition in the wild［C］// Proceedings of 26th ACM Multimedia Conference. New York： ACM， 2018：1301-1309. 10.1145/3240508.3240616
50 DONAHUE J， JIA Y Q， VINYALS O， et al. DeCAF： a deep convolutional activation feature for generic visual recognition［C］// Proceedings of the 31st International Conference on Machine Learning. New York： JMLR.org， 2014：647-655.
51 FARRELL R， OZA O， ZHANG N， et al. Birdlets： subordinate categorization using volumetric primitives and pose?normalized appearance［C］// Proceedings of the 2011 International Conference on Computer Vision. Piscataway： IEEE， 2011：161-168. 10.1109/iccv.2011.6126238
52 BOURDEV L， MALIK J. Poselets： body part detectors trained using 3D human pose annotations［C］// Proceedings of the 2009 IEEE 12th International Conference on Computer Vision. Piscataway： IEEE， 2009：1365-1372. 10.1109/iccv.2009.5459303
53 BOURDEV L， MAJI S， MALIK J. Describing people： poselet?based approach to attribute classification［C］// Proceedings of the 2011 International Conference on Computer Vision. Piscataway： IEEE， 2011：1543-1550. 10.1109/iccv.2011.6126413
54 FELZENSZWALB P F， GIRSHICK R B， McALLESTER D， et al. Object detection with discriminatively trained part based models［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2010， 32（9）：1627-1645. 10.1109/tpami.2009.167
55 PARKHI O M， VEDALDI A， JAWAHAR C V， et al. The truth about cats and dogs［C］// Proceedings of the 2011 International Conference on Computer Vision. Piscataway： IEEE， 2011：1427-1434. 10.1109/iccv.2011.6126398
56 ZHANG N， FARRELL R， IANDOLA F， et al. Deformable part descriptors for fine?grained recognition and attribute prediction［C］// Proceedings of the 2013 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2013：729-736. 10.1109/iccv.2013.96
57 ZHANG N， PALURI M， RANZATO M， et al. PANDA： pose aligned networks for deep attribute modeling［C］// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2014：1637-1644. 10.1109/cvpr.2014.212
58 GIRSHICK R， DONAHUE J， DARRELL T， et al. Rich feature hierarchies for accurate object detection and semantic segmentation［C］// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2014：580-587. 10.1109/cvpr.2014.81
59 ZHANG N， DONAHUE J， GIRSHICK R， et al. Part?based RCNNs for fine?grained category detection［C］// Proceedings of the 2014 European Conference on Computer Vision， LNCS 8689. Cham： Springer， 2014：834-849.
60 BRANSON S， HORN G van， BELONGIE S， et al. Bird species categorization using pose normalized deep convolutional nets［C］// Proceedings of the 2014 British Machine Vision Conference. Durham： BMVA Press， 2014：No.71. 10.5244/c.28.87
61 LIN D， SHEN X Y， LU C W， et al. Deep LAC： deep localization， alignment and classification for fine?grained recognition［C］// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2015：1666-1674. 10.1109/cvpr.2015.7298775
62 SHELHAMER E， LONG J， DARRELL T. Fully convolutional networks for semantic segmentation［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2017， 39（4）：640-651. 10.1109/tpami.2016.2572683
63 WEI X S， XIE C W， WU J X， et al. Mask?CNN： localizing parts and selecting descriptors for fine?grained image recognition［J］. Pattern Recognition， 2018， 76：704-714. 10.1016/j.patcog.2017.10.002
64 黄伟锋，张甜，常东良，等. 基于多视角融合的细粒度图像分类方法［J］. 信号处理， 2020， 36（9）：1607-1614. 10.16798/j.issn.1003-0530.2020.09.027 HUANG W F， ZHANG T， CHANG D L， et al. Multi?view comprehensive based fine?grained image classification［J］. Journal of Signal Processing， 2020， 36（9）：1607-1614. 10.16798/j.issn.1003-0530.2020.09.027
65 GAO Y， BEIJBOM O， ZHANG N， et al. Compact bilinear pooling［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016：317-326. 10.1109/cvpr.2016.41
66 KAR P， KARNICK H. Random feature maps for dot product kernels［C］// Proceedings of the 15th International Conference on Artificial Intelligence and Statistics. New York： JMLR.org， 2012：583-591.
67 PHAM N， PAGH R. Fast and scalable polynomial kernels via explicit feature maps［C］// Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York： ACM， 2013：239-247. 10.1145/2487575.2487591
68 KONG S， FOWLKES C. Low?rank bilinear pooling for fine?grained classification［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017：7025-7034. 10.1109/cvpr.2017.743
69 LI Y H， WANG N Y， LIU J Y， et al. Factorized bilinear models for image recognition［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017：2098-2106. 10.1109/iccv.2017.229
70 葛疏雨，高子淋，张冰冰，等. 基于核化双线性卷积网络的细粒度图像分类［J］. 电子学报， 2019， 47（10）：2134-2141. 10.3969/j.issn.0372-2112.2019.10.015 GE S Y， GAO Z L， ZHANG B B， et al. Kernelized bilinear CNN models for fine?grained visual recognition［J］. Acta Electronica Sinica， 2019， 47（10）：2134-2141. 10.3969/j.issn.0372-2112.2019.10.015
71 LIN T Y， MAJI S. Improved bilinear pooling with CNNs［C］// Proceedings of the 2017 British Machine Vision Conference. Durham： BMVA Press， 2017： No.117. 10.5244/c.31.117
72 CUI Y， ZHOU F， WANG J， et al. Kernel pooling for convolutional neural networks［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017：3049-3058. 10.1109/cvpr.2017.325
73 MOGHIMI M， SABERIAN M， YANG J， et al. Boosted convolutional neural networks［C］// Proceedings of the 2016 British Machine Vision Conference. Durham： BMVA Press， 2016： No.24. 10.5244/c.30.24
74 闫子旭，侯志强，熊磊，等. YOLOv3和双线性特征融合的细粒度图像分类［J］. 中国图象图形学报， 2021， 26（4）：847-856. 10.11834/jig.200031 YAN Z X， HOU Z Q， XIONG L， et al. Fine?grained classification based on bilinear feature fusion and YOLOv3［J］. Journal of Image and Graphics， 2021， 26（4）：847-856. 10.11834/jig.200031
75 YU C J， ZHAO X Y， ZHENG Q， et al. Hierarchical bilinear pooling for fine?grained visual recognition［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11220. Cham： Springer， 2018：595-610.
76 ITTI L， KOCH C， NIEBUR E. A model of saliency?based visual attention for rapid scene analysis［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 1998， 20（11）：1254-1259. 10.1109/34.730558
77 MNIH V， HEESS N， GRAVES， et al. Recurrent models of visual attention［C］// Proceedings of the 27th International Conference on Neural Information Processing Systems， Volume 2. Cambridge： MIT Press， 2014：2204-2212.
78 ZHENG H L， FU J L， MEI T， et al. Learning multi?attention convolutional neural network for fine?grained image recognition［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017：5219-5227. 10.1109/iccv.2017.557
79 CHANG D L， DING Y F， XIE J Y， et al. The devil is in the channels： mutual?channel loss for fine?grained image classification［J］. IEEE Transactions on Image Processing， 2020， 29：4683-4695. 10.1109/tip.2020.2973812
80 ZHUANG P Q， WANG Y L， QIAO Y. Learning attentive pairwise interaction for fine?grained classification［C］// Proceedings of the 34th AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2020：13130-13137. 10.1609/aaai.v34i07.7016
81 ZHANG T， CHANG D L， MA Z Y， et al. Progressive co?attention network for fine?grained visual classification［C］// Proceedings of the 2021 International Conference on Visual Communications and Image Processing. Piscataway： IEEE， 2021：1-5. 10.1109/vcip53242.2021.9675376
82 JI R Y， WEN L Y， ZHANG L B， et al. Attention convolutional binary neural tree for fine?grained visual categorization［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020：10465-10474. 10.1109/cvpr42600.2020.01048
83 SUN M， YUAN Y C， ZHOU F， et al. Multi?attention multi?class constraint for fine?grained image recognition［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11220. Cham： Springer， 2018： 834-850.
84 VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2017：6000-6010.
85 OTT M， EDUNOV S， GRANGIER D， et al. Scaling neural machine translation［C］// Proceedings of the 3rd Conference on Machine Translation： Research Papers. Stroudsburg， PA： ACL， 2018：1-9. 10.18653/v1/w18-6301
86 DOSOVITSKIY A， BEYER L， KOLESNIKOV A， et al. An image is worth 16x16 words： Transformers for image recognition at scale［EB/OL］. （2021-06-03）［2021-06-11］.https：//arxiv.org/pdf/2010.11929.pdf.
87 CARION N， MASSA F， SYNNAEVE G， et al. End?to?end object detection with transformers［C］// Proceedings of the 2020 European Conference on Computer Vision， LNCS 12346. Cham： Springer， 2020：213-229.
88 ZHU X Z， SU W J， LU L W， et al. Deformable DETR： deformable Transformers for end?to?end object detection［EB/OL］. （2021-03-18）［2021-11-11］.https：//arxiv.org/pdf/2010.04159.pdf.

[1]	潘烨新, 杨哲. 基于多级特征双向融合的小目标检测优化模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2871-2877.
[2]	李云, 王富铕, 井佩光, 王粟, 肖澳. 基于不确定度感知的帧关联短视频事件检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2903-2910.
[3]	赵志强, 马培红, 黑新宏. 基于双重注意力机制的人群计数方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2886-2892.
[4]	黄云川, 江永全, 黄骏涛, 杨燕. 基于元图同构网络的分子毒性预测[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2964-2969.
[5]	李顺勇, 李师毅, 胥瑞, 赵兴旺. 基于自注意力融合的不完整多视图聚类算法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2696-2703.
[6]	秦璟, 秦志光, 李发礼, 彭悦恒. 基于概率稀疏自注意力神经网络的重性抑郁疾患诊断[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2970-2974.
[7]	王熙源, 张战成, 徐少康, 张宝成, 罗晓清, 胡伏原. 面向手术导航3D/2D配准的无监督跨域迁移网络[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2911-2918.
[8]	李力铤, 华蓓, 贺若舟, 徐况. 基于解耦注意力机制的多变量时序预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2732-2738.
[9]	赵宇博, 张丽萍, 闫盛, 侯敏, 高茂. 基于改进分段卷积神经网络和知识蒸馏的学科知识实体间关系抽取[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2421-2429.
[10]	张春雪, 仇丽青, 孙承爱, 荆彩霞. 基于两阶段动态兴趣识别的购买行为预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2365-2371.
[11]	付帅, 郭小英, 白茹意, 闫涛, 陈斌. 改进的CloFormer模型与有序回归相结合的年龄评估方法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2372-2380.
[12]	薛凯鹏, 徐涛, 廖春节. 融合自监督和多层交叉注意力的多模态情感分析网络[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2387-2392.
[13]	汪雨晴, 朱广丽, 段文杰, 李书羽, 周若彤. 基于交互注意力机制的心理咨询文本情感分类模型[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2393-2399.
[14]	高鹏淇, 黄鹤鸣, 樊永红. 融合坐标与多头注意力机制的交互语音情感识别[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2400-2406.
[15]	刘禹含, 吉根林, 张红苹. 基于骨架图与混合注意力的视频行人异常检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2551-2557.

细粒度图像分类综述

Review of fine-grained image categorization

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics