Journal of Computer Applications ›› 2021, Vol. 41 ›› Issue (5): 1319-1325.DOI: 10.11772/j.issn.1001-9081.2020071105

Special Issue: 人工智能

• Artificial intelligence • Previous Articles     Next Articles

Weakly supervised fine-grained image classification algorithm based on attention-attention bilinear pooling

LU Xinwei, YU Pengfei, LI Haiyan, LI Hongsong, DING Wenqian   

  1. School of Information Science and Engineering, Yunnan University, Kunming Yunnan 650500, China
  • Received:2020-07-27 Revised:2020-09-29 Online:2021-05-10 Published:2021-05-19
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (62066046).


陆鑫伟, 余鹏飞, 李海燕, 李红松, 丁文谦   

  1. 云南大学 信息学院, 昆明 650500
  • 通讯作者: 余鹏飞
  • 作者简介:陆鑫伟(1995-),男,江苏无锡人,硕士研究生,主要研究方向:图像处理、深度学习;余鹏飞(1974-),男,云南昆明人,副教授,博士,主要研究方向:模式识别、生物特征识别;李海燕(1976-),女,云南昆明人,教授,博士,主要研究方向:模式识别、图像处理;李红松(1974-),男,云南昆明人,副教授,博士,主要研究方向:图像处理;丁文谦(1995-),男,湖北襄阳人,硕士研究生,主要研究方向:图像处理、深度学习。
  • 基金资助:

Abstract: With the rapid development of artificial intelligence, the purpose of image classification is not only to identify the major categories of objects, but also to classify the images of the same category into more detailed subcategories. In order to effectively discriminate small differences between categories, a fine-grained classification algorithm was proposed based on Attention-Attention Bilinear Pooling (AABP). Firstly, the Inception V3 pre-training model was applied to extract the global image features, and the local attention region on the feature mapping was forecasted with the deep separable convolution. Then, the Weakly Supervised Data Augmentation Network (WS-DAN) was applied to feed the augmented image back into the network, so as to enhance the generalization ability of the network to prevent overfitting. Finally, the linear fusion of the further extracted attention features was performed in AABP network to improve the accuracy of the classification. Experimental results show that this method achieves accuracy of 88.51% and top5 accuracy of 97.65% on CUB-200-2011 dataset, accuracy of 89.77% and top5 accuracy of 99.27% on Stanford Cars dataset, and accuracy of 93.5% and top5 accuracy of 97.96% on FGVC-Aircraft dataset.

Key words: fine-grained classification, linear fusion, weakly supervised, data augmentation, deep separable convolution

摘要: 随着人工智能的飞速发展,计算机视觉领域对图像的分类任务不仅仅限于识别出物体的大类,更需要对同一类别的图像进行更加细致的子类划分。为了有效区分出类间的微小差异以及减少背景因素的干扰,提出了一种基于AABP的细粒度分类算法。首先,通过Inception V3预训练模型提取全局图像特征,并利用深度可分离卷积在特征映射上预测出局部注意力区域;然后,应用弱监督数据增强网络(WS-DAN)的算法将增强后的图像反馈回网络中,以此加强网络的泛化能力,防止过拟合;最后,将进一步提取的注意力特征区域在AABP网络中进行线性融合,以提升分类的精度。实验结果表明,该算法在数据集CUB-200-2011上达到88.51%的准确率、97.65%的top5准确率,在Stanford Cars数据集上到89.77%的准确率、99.27%的top5准确率,在FGVC-Aircraft数据集上到93.5%的准确率、97.96%的top5准确率。

关键词: 细粒度分类, 线性融合, 弱监督, 数据增强, 深度可分离卷积

CLC Number: