基于注意力自身线性融合的弱监督细粒度图像分类算法

doi:10.11772/j.issn.1001-9081.2020071105

计算机应用 ›› 2021, Vol. 41 ›› Issue (5): 1319-1325.DOI: 10.11772/j.issn.1001-9081.2020071105

所属专题：人工智能

基于注意力自身线性融合的弱监督细粒度图像分类算法

陆鑫伟, 余鹏飞, 李海燕, 李红松, 丁文谦

云南大学信息学院, 昆明 650500

收稿日期:2020-07-27 修回日期:2020-09-29 发布日期:2021-05-19 出版日期:2021-05-10
通讯作者: 余鹏飞
作者简介:陆鑫伟(1995-),男,江苏无锡人,硕士研究生,主要研究方向:图像处理、深度学习;余鹏飞(1974-),男,云南昆明人,副教授,博士,主要研究方向:模式识别、生物特征识别;李海燕(1976-),女,云南昆明人,教授,博士,主要研究方向:模式识别、图像处理;李红松(1974-),男,云南昆明人,副教授,博士,主要研究方向:图像处理;丁文谦(1995-),男,湖北襄阳人,硕士研究生,主要研究方向:图像处理、深度学习。
基金资助:
国家自然科学基金资助项目（62066046）。

Weakly supervised fine-grained image classification algorithm based on attention-attention bilinear pooling

LU Xinwei, YU Pengfei, LI Haiyan, LI Hongsong, DING Wenqian

School of Information Science and Engineering, Yunnan University, Kunming Yunnan 650500, China

Received:2020-07-27 Revised:2020-09-29 Online:2021-05-19 Published:2021-05-10
Supported by:
This work is partially supported by the National Natural Science Foundation of China (62066046).

摘要/Abstract

摘要： 随着人工智能的飞速发展，计算机视觉领域对图像的分类任务不仅仅限于识别出物体的大类，更需要对同一类别的图像进行更加细致的子类划分。为了有效区分出类间的微小差异以及减少背景因素的干扰，提出了一种基于AABP的细粒度分类算法。首先，通过Inception V3预训练模型提取全局图像特征，并利用深度可分离卷积在特征映射上预测出局部注意力区域；然后，应用弱监督数据增强网络（WS-DAN）的算法将增强后的图像反馈回网络中，以此加强网络的泛化能力，防止过拟合；最后，将进一步提取的注意力特征区域在AABP网络中进行线性融合，以提升分类的精度。实验结果表明，该算法在数据集CUB-200-2011上达到88.51%的准确率、97.65%的top5准确率，在Stanford Cars数据集上到89.77%的准确率、99.27%的top5准确率，在FGVC-Aircraft数据集上到93.5%的准确率、97.96%的top5准确率。

关键词: 细粒度分类, 线性融合, 弱监督, 数据增强, 深度可分离卷积

Abstract: With the rapid development of artificial intelligence, the purpose of image classification is not only to identify the major categories of objects, but also to classify the images of the same category into more detailed subcategories. In order to effectively discriminate small differences between categories, a fine-grained classification algorithm was proposed based on Attention-Attention Bilinear Pooling (AABP). Firstly, the Inception V3 pre-training model was applied to extract the global image features, and the local attention region on the feature mapping was forecasted with the deep separable convolution. Then, the Weakly Supervised Data Augmentation Network (WS-DAN) was applied to feed the augmented image back into the network, so as to enhance the generalization ability of the network to prevent overfitting. Finally, the linear fusion of the further extracted attention features was performed in AABP network to improve the accuracy of the classification. Experimental results show that this method achieves accuracy of 88.51% and top5 accuracy of 97.65% on CUB-200-2011 dataset, accuracy of 89.77% and top5 accuracy of 99.27% on Stanford Cars dataset, and accuracy of 93.5% and top5 accuracy of 97.96% on FGVC-Aircraft dataset.

Key words: fine-grained classification, linear fusion, weakly supervised, data augmentation, deep separable convolution

中图分类号:

TP391.4

陆鑫伟, 余鹏飞, 李海燕, 李红松, 丁文谦. 基于注意力自身线性融合的弱监督细粒度图像分类算法[J]. 计算机应用, 2021, 41(5): 1319-1325.

LU Xinwei, YU Pengfei, LI Haiyan, LI Hongsong, DING Wenqian. Weakly supervised fine-grained image classification algorithm based on attention-attention bilinear pooling[J]. Journal of Computer Applications, 2021, 41(5): 1319-1325.

参考文献

[1] 罗建豪, 吴建鑫. 基于深度卷积特征的细粒度图像分类研究综述[J]. 自动化学报,2017,43(8):1306-1318.(LUO J H, WU J X. A survey on fine-grained image categorization using deep convolutional features[J]. Acta Automatica Sinica,2017,43(8):1306-1318.)
[2] ZHANG N,DONAHUE J,GIRSHICK R,et al. Part-based RCNNs for fine-grained category detection[C]//Proceedings of the 2014 European Conference on Computer Vision, LNCS 8689. Cham:Springer,2014:834-849.
[3] BRANSON S,BEIJBOM O,BELONGIE S. Efficient large-scale structured learning[C]//Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2013:1806-1813.
[4] LIN D,SHEN X,LU C,et al. Deep LAC:deep localization, alignment and classification for fine-grained recognition[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2015:1666-1674.
[5] FU J,ZHENG H,MEI T. Look closer to see better:recurrent attention convolutional neural network for fine-grained image recognition[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2017:4476-4484.
[6] YU C,ZHAO X,ZHENG Q,et al. Hierarchical bilinear pooling for fine-grained visual recognition[C]//Proceedings of the 2018 European Conference on Computer Vision,LNCS 11220. Cham:Springer,2018:595-610.
[7] YANG Z,LUO T,WANG D,et al. Learning to navigate for finegrained classification[C]//Proceedings of the 2018 European Conference on Computer Vision,LNCS 11218. Cham:Springer, 2018:438-454.
[8] HU T,QI H,HUANG Q,et al. See better before looking closer:weakly supervised data augmentation network for fine-grained visual classification[EB/OL].[2020-03-13]. https://arxiv.org/pdf/1901.09891.pdf.
[9] CHATTOPADHYAY A,SARKAR A,HOWLADER P,et al. GradCAM++:improved visual explanations for deep convolutional networks[C]//Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision. Piscataway:IEEE, 2018:839-847.
[10] HE K,ZHANG X,REN S,et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2016:770-778.
[11] SHU X,TANG J,QI G,et al. Image classification with tailored fine-grained dictionaries[J]. IEEE Transactions on Circuits and Systems for Video Technology,2018,28(2):454-467.
[12] CHEN X,XU C,YANG X,et al. Gated-GAN:adversarial gated networks for multi-collection style transfer[J]. IEEE Transactions on Image Processing,2019,28(2):546-560.
[13] MAIER K L,FILDANI A,PAULL C K,et al. Deep-sea channel evolution and stratigraphic architecture from inception to abandonment from high-resolution autonomous underwater vehicle surveys offshore central California[J]. Sedimentology,2013,60(4):935-960.
[14] JOSHI K,TRIPATHI V,BOSE C,et al. Robust sports image classification using Inception V3 and neural networks[J]. Procedia Computer Science,2020,167:2374-2381.
[15] WANG Y,MORARIU V I,DAVIS L S. Learning a discriminative filter bank within a CNN for fine-grained recognition[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2018:4148-4157.
[16] LI P,XIE J,WANG Q,et al. Towards faster training of global covariance pooling networks by iterative matrix square root normalization[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2018:947-955.
[17] ZHANG H,DAUPHIN Y N,MA T. Fixup initialization:residual learning without normalization[C/OL]//Proceedings of the 2019 International Conference on Learning Representations.[2020-05-15]. https://arxiv.org/pdf/1901.09321.pdf.
[18] HUANG G,LIU Z,VAN DER MAATEN L,et al. Densely connected convolutional networks[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2017:2261-2269.
[19] YU F,WANG D,SHELHAMER E,et al. Deep layer aggregation[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2018:2403-2412.
[20] WAH C,BRANSON S,WELINDER P,et al. The Caltech-UCSD Birds-200-2011 dataset[EB/OL].[2020-07-05]. http://www.vision.caltech.edu/visipedia/papers/CUB_200_2011.pdf.
[21] WU X,MORI M,KASHINO K. Data-driven taxonomy forest for fine-grained image categorization[C]//Proceedings of the 2015 IEEE International Conference on Multimedia and Expo. Piscataway:IEEE,2015:1-6.
[22] MAJI S,RAHTU E,KANNALA J,et al. Fine-grained visual classification of aircraft[EB/OL].[2020-07-05]. https://arxiv.org/pdf/1306.5151v1.pdf.

基于注意力自身线性融合的弱监督细粒度图像分类算法

Weakly supervised fine-grained image classification algorithm based on attention-attention bilinear pooling

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	杨莹, 郝晓燕, 于丹, 马垚, 陈永乐. 面向图神经网络模型提取攻击的图数据生成方法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2483-2492.
[2]	汪炅, 唐韬韬, 贾彩燕. 无负采样的正样本增强图对比学习推荐方法PAGCL[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1485-1492.
[3]	朱子蒙, 李志新, 郇战, 陈瑛, 梁久祯. 基于三元中心引导的弱监督视频异常检测[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1452-1457.
[4]	郭洁, 林佳瑜, 梁祖红, 罗孝波, 孙海涛. 基于知识感知和跨层次对比学习的推荐方法[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1121-1127.
[5]	党伟超, 张磊, 高改梅, 刘春霞. 融合片段对比学习的弱监督动作定位方法[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 548-555.
[6]	郭安迪, 贾真, 李天瑞. 基于伪实体数据增强的高精准率医学领域实体关系抽取[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 393-402.
[7]	宋逸飞, 柳毅. 基于数据增强和标签噪声的快速对抗训练方法[J]. 《计算机应用》唯一官方网站, 2024, 44(12): 3798-3807.
[8]	胡新荣, 陈静雪, 黄子键, 王帮超, 姚迅, 刘军平, 朱强, 杨捷. 基于图卷积网络的掩码数据增强[J]. 《计算机应用》唯一官方网站, 2024, 44(11): 3335-3344.
[9]	姜钧舰, 刘达维, 刘逸凡, 任酉贵, 赵志滨. 基于孪生网络的小样本目标检测算法[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2325-2329.
[10]	詹春兰, 王安志, 王明辉. 基于通道注意力和边缘融合的伪装目标分割方法[J]. 《计算机应用》唯一官方网站, 2023, 43(7): 2166-2172.
[11]	郭奕裕, 周箩鱼, 刘新瑜, 李尧. 改进注意力机制的电梯场景下危险品检测方法[J]. 《计算机应用》唯一官方网站, 2023, 43(7): 2295-2302.
[12]	王强, 黄小明, 佟强, 刘秀磊. 基于边界框标注的弱监督显著性目标检测算法[J]. 《计算机应用》唯一官方网站, 2023, 43(6): 1910-1918.
[13]	刘辉, 张琳玉, 王复港, 何如瑾. 基于注意力机制和上下文信息的目标检测算法[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1557-1564.
[14]	蒋瑞林, 覃仁超. 基于深度可分离卷积的多神经网络恶意代码检测模型[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1527-1533.
[15]	蔡引江, 许光俊, 马喜波. 图结构表示下的药物数据增强方法[J]. 《计算机应用》唯一官方网站, 2023, 43(4): 1136-1141.