Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (8): 2556-2563.DOI: 10.11772/j.issn.1001-9081.2022071090
Special Issue: 多媒体计算与计算机仿真
• Multimedia computing and computer simulation • Previous Articles Next Articles
Received:
2022-07-27
Revised:
2022-11-03
Accepted:
2022-11-07
Online:
2023-01-15
Published:
2023-08-10
Contact:
Xuanlin WANG
About author:
QI Ailing, born in 1972, Ph. D., associate professor. Her research interests include artificial intelligence, digital image processing.
Supported by:
通讯作者:
王宣淋
作者简介:
齐爱玲(1972—),女,陕西西安人,副教授,博士,主要研究方向:人工智能、数字图像处理;
基金资助:
CLC Number:
Ailing QI, Xuanlin WANG. Fine-grained image recognition based on mid-level subtle feature extraction and multi-scale feature fusion[J]. Journal of Computer Applications, 2023, 43(8): 2556-2563.
齐爱玲, 王宣淋. 基于中层细微特征提取与多尺度特征融合细粒度图像识别[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2556-2563.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2022071090
数据集 | 名字 | 类别数 | 样本数 | |
---|---|---|---|---|
训练集 | 测试集 | |||
CUB-200-2011 | Bird | 200 | 5 994 | 5 794 |
Stanford Cars | Car | 196 | 8 144 | 8 041 |
FGVC-Aircraft | Aircraft | 100 | 6 667 | 3 333 |
Tab. 1 Statistics of three fine-grained datasets
数据集 | 名字 | 类别数 | 样本数 | |
---|---|---|---|---|
训练集 | 测试集 | |||
CUB-200-2011 | Bird | 200 | 5 994 | 5 794 |
Stanford Cars | Car | 196 | 8 144 | 8 041 |
FGVC-Aircraft | Aircraft | 100 | 6 667 | 3 333 |
CUB-200-2011 | Stanford Cars | FGVC-Aircraft | |
---|---|---|---|
0.5 | 87.64 | 91.10 | 90.94 |
0.6 | 88.10 | 92.54 | 91.46 |
0.7 | 88.94 | 93.36 | 93.20 |
0.8 | 89.52 | 94.64 | 92.79 |
0.9 | 89.18 | 93.87 | 92.56 |
Tab. 2 Top-1 Accuracy of different ? values on datasets
CUB-200-2011 | Stanford Cars | FGVC-Aircraft | |
---|---|---|---|
0.5 | 87.64 | 91.10 | 90.94 |
0.6 | 88.10 | 92.54 | 91.46 |
0.7 | 88.94 | 93.36 | 93.20 |
0.8 | 89.52 | 94.64 | 92.79 |
0.9 | 89.18 | 93.87 | 92.56 |
算法 | CUB-200-2011 | Stanford Cars | FGVC Aircraft | |||
---|---|---|---|---|---|---|
Top-1 | Top-5 | Top-1 | Top-5 | Top-1 | Top-5 | |
ResNet | 85.50 | 92.54 | 89.80 | 94.63 | 90.30 | 94.41 |
ResNet-CPFDEN | 88.94 | 96.44 | 93.40 | 97.82 | 92.60 | 96.83 |
Resnet-CPFDEN-CSMFN | 89.52 | 98.46 | 94.64 | 98.62 | 93.20 | 97.98 |
Tab. 3 Results of ablation experiments on three datasets
算法 | CUB-200-2011 | Stanford Cars | FGVC Aircraft | |||
---|---|---|---|---|---|---|
Top-1 | Top-5 | Top-1 | Top-5 | Top-1 | Top-5 | |
ResNet | 85.50 | 92.54 | 89.80 | 94.63 | 90.30 | 94.41 |
ResNet-CPFDEN | 88.94 | 96.44 | 93.40 | 97.82 | 92.60 | 96.83 |
Resnet-CPFDEN-CSMFN | 89.52 | 98.46 | 94.64 | 98.62 | 93.20 | 97.98 |
算法 | CUB-200-2011 | Stanford Cars | FGVC-Aircraft |
---|---|---|---|
DCL-Net | 87.40 | 93.10 | 91.70 |
TPA-CNN | 88.00 | 94.00 | 91.70 |
ACB-Net | 88.10 | 94.60 | 92.40 |
本文算法 | 89.52 | 94.64 | 93.20 |
Tab. 4 Comparison of Top-1 classification accuracy of different algorithms on three datasets
算法 | CUB-200-2011 | Stanford Cars | FGVC-Aircraft |
---|---|---|---|
DCL-Net | 87.40 | 93.10 | 91.70 |
TPA-CNN | 88.00 | 94.00 | 91.70 |
ACB-Net | 88.10 | 94.60 | 92.40 |
本文算法 | 89.52 | 94.64 | 93.20 |
算法 | CUB-200-2011 | Stanford Cars | FGVC-Aircraft | |||
---|---|---|---|---|---|---|
Top-1 | Top-5 | Top-1 | Top-5 | Top-1 | Top-5 | |
PPL-Net | 88.30 | — | 94.00 | — | 92.60 | — |
PCA-Net | 88.30 | 97.43 | 94.30 | 97.74 | 92.40 | 96.86 |
本文算法 | 89.52 | 98.46 | 94.64 | 98.62 | 93.20 | 97.98 |
Tab. 5 Comparison of accuracy of the proposed algorithm with PPL-Net and PCA-Net algorithms on three datasets
算法 | CUB-200-2011 | Stanford Cars | FGVC-Aircraft | |||
---|---|---|---|---|---|---|
Top-1 | Top-5 | Top-1 | Top-5 | Top-1 | Top-5 | |
PPL-Net | 88.30 | — | 94.00 | — | 92.60 | — |
PCA-Net | 88.30 | 97.43 | 94.30 | 97.74 | 92.40 | 96.86 |
本文算法 | 89.52 | 98.46 | 94.64 | 98.62 | 93.20 | 97.98 |
1 | 马瑶,智敏,殷雁君,等. CNN和Transformer在细粒度图像识别中的应用综述[J]. 计算机工程与应用, 2022, 58(19):53-63. 10.3778/j.issn.1002-8331.2201-0374 |
MA Y, ZHI M, YIN Y J, et al. Review of applications of CNN and Transformer in fine-grained image recognition[J]. Computer Engineering and Applications, 2022, 58(19):53-63. 10.3778/j.issn.1002-8331.2201-0374 | |
2 | WEI X S, XIE C W, WU J X, et al. Mask-CNN: localizing parts and selecting descriptors for fine-grained bird species categorization[J]. Pattern Recognition, 2018, 76:704-714. 10.1016/j.patcog.2017.10.002 |
3 | ZHANG N, DONAHUE J, GIRSHICK R, et al. Part-based R-CNNs for fine-grained category detection[C]// Proceedings of the 2014 European Conference on Computer Vision, LNCS 8689. Cham: Springer, 2014:834-849. |
4 | ZHANG X F, LIN W S, HUANG Q M. Fine-grained image quality assessment: a revisit and further thinking[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(5):2746-2759. 10.1109/tcsvt.2021.3096528 |
5 | CHEN Y, BAI Y L, ZHANG W, et al. Destruction and construction learning for fine-grained image recognition[C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019:5152-5161. 10.1109/cvpr.2019.00530 |
6 | YAN T T, WANG S J, WANG Z H, et al. Progressive learning for weakly supervised fine-grained classification[J]. Signal Processing, 2020, 171: No.107519. 10.1016/j.sigpro.2020.107519 |
7 | ZHANG T, CHANG D L, MA Z Y, et al. Progressive co-attention network for fine-grained visual classification[C]// Proceedings of the 2021 International Conference on Visual Communications and Image Processing. Piscataway: IEEE, 2021:1-5. 10.1109/vcip53242.2021.9675376 |
8 | ZHAO Y F, YAN K, HUANG F Y, et al. Graph-based high order relation discovery for fine-grained recognition[C]// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 15074-15083. 10.1109/cvpr46437.2021.01483 |
9 | WEI H, ZHU M, WANG B, et al. Two-level progressive attention convolutional network for fine-grained image recognition[J]. IEEE Access, 2020, 8:104985-104995. 10.1109/access.2020.2999722 |
10 | 东南大学. 一种基于多尺度特征融合的图像细粒度识别方法: 201910282865.4[P]. 2019-08-06. |
Southeast University. A fine-grained image recognition method based on multi-scale feature fusion: 201910282865.4[P]. 2019-08-06. | |
11 | JI R Y, WEN L Y, ZHANG L B, et al. Attention convolutional binary neural tree for fine-grained visual categorization[C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020:10465-10474. 10.1109/cvpr42600.2020.01048 |
12 | YAN T T, SHI J, LI H J. Discriminative information restoration and extraction for weakly supervised low-resolution fine-grained image recognition[J]. Pattern Recognition, 2022, 127: No.108629. 10.1016/j.patcog.2022.108629 |
13 | CAO S Y, WANG W, ZHANG J, et al. A few-shot fine-grained image classification method leveraging global and local structures[J]. International Journal of Machine Learning and Cybernetics, 2022, 13(8):2273-2281. 10.1007/s13042-022-01522-w |
14 | WANG L, HE K, FENG X, et al. Multilayer feature fusion with parallel convolutional block for fine-grained image classification[J]. Applied Intelligence, 2022, 52(3):2872-2883. 10.1007/s10489-021-02573-2 |
15 | HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016:770-778. 10.1109/cvpr.2016.90 |
16 | HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018:7132-7141. 10.1109/cvpr.2018.00745 |
17 | WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]// Proceedings of the 2018 European Conference on Computer Vision, LNCS 11211. Cham: Springer, 2018: 3-19. |
18 | WAH C, BRANSON S, WELINDER P, et al. The Caltech-UCSD Birds-200-2011 dataset[EB/OL]. [2020-07-05].. |
19 | KRAUSE J, STARK M, DENG J, et al. 3D object representations for fine-grained categorization[C]// Proceedings of the 2013 IEEE International Conference on Computer Vision Workshops. Piscataway: IEEE, 2013: 554-561. 10.1109/iccvw.2013.77 |
20 | MAJI S, RAHTU E, KANNALA J, et al. Fine-grained visual classification of aircraft[EB/OL]. (2013-06-21) [2020-07-05].. |
21 | LI P H, XIE J T, WANG Q L, et al. Towards faster training of global covariance pooling networks by iterative matrix square root normalization[C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 947-955. 10.1109/cvpr.2018.00105 |
22 | LERMA M, LUCAS M. Grad-CAM++ is equivalent to Grad-CAM with positive gradients[C/OL]// Proceedings of the 24th Irish Machine Vision and Image Processing Conference [2022-05-22].. 10.56541/awjv6348 |
[1] | Zhiqiang ZHAO, Peihong MA, Xinhong HEI. Crowd counting method based on dual attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2886-2892. |
[2] | Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974. |
[3] | Liting LI, Bei HUA, Ruozhou HE, Kuang XU. Multivariate time series prediction model based on decoupled attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2732-2738. |
[4] | Kaipeng XUE, Tao XU, Chunjie LIAO. Multimodal sentiment analysis network with self-supervision and multi-layer cross attention [J]. Journal of Computer Applications, 2024, 44(8): 2387-2392. |
[5] | Pengqi GAO, Heming HUANG, Yonghong FAN. Fusion of coordinate and multi-head attention mechanisms for interactive speech emotion recognition [J]. Journal of Computer Applications, 2024, 44(8): 2400-2406. |
[6] | Zhonghua LI, Yunqi BAI, Xuejin WANG, Leilei HUANG, Chujun LIN, Shiyu LIAO. Low illumination face detection based on image enhancement [J]. Journal of Computer Applications, 2024, 44(8): 2588-2594. |
[7] | Chenqian LI, Jun LIU. Ultrasound carotid plaque segmentation method based on semi-supervision and multi-scale cascaded attention [J]. Journal of Computer Applications, 2024, 44(8): 2604-2610. |
[8] | Shangbin MO, Wenjun WANG, Ling DONG, Shengxiang GAO, Zhengtao YU. Single-channel speech enhancement based on multi-channel information aggregation and collaborative decoding [J]. Journal of Computer Applications, 2024, 44(8): 2611-2617. |
[9] | Wu XIONG, Congjun CAO, Xuefang SONG, Yunlong SHAO, Xusheng WANG. Handwriting identification method based on multi-scale mixed domain attention mechanism [J]. Journal of Computer Applications, 2024, 44(7): 2225-2232. |
[10] | Huanhuan LI, Tianqiang HUANG, Xuemei DING, Haifeng LUO, Liqing HUANG. Public traffic demand prediction based on multi-scale spatial-temporal graph convolutional network [J]. Journal of Computer Applications, 2024, 44(7): 2065-2072. |
[11] | Dianhui MAO, Xuebo LI, Junling LIU, Denghui ZHANG, Wenjing YAN. Chinese entity and relation extraction model based on parallel heterogeneous graph and sequential attention mechanism [J]. Journal of Computer Applications, 2024, 44(7): 2018-2025. |
[12] | Li LIU, Haijin HOU, Anhong WANG, Tao ZHANG. Generative data hiding algorithm based on multi-scale attention [J]. Journal of Computer Applications, 2024, 44(7): 2102-2109. |
[13] | Song XU, Wenbo ZHANG, Yifan WANG. Lightweight video salient object detection network based on spatiotemporal information [J]. Journal of Computer Applications, 2024, 44(7): 2192-2199. |
[14] | Dahai LI, Zhonghua WANG, Zhendong WANG. Dual-branch low-light image enhancement network combining spatial and frequency domain information [J]. Journal of Computer Applications, 2024, 44(7): 2175-2182. |
[15] | Wenliang WEI, Yangping WANG, Biao YUE, Anzheng WANG, Zhe ZHANG. Deep learning model for infrared and visible image fusion based on illumination weight allocation and attention [J]. Journal of Computer Applications, 2024, 44(7): 2183-2191. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||