Shipping monitoring image recognition model based on attention mechanism network

doi:10.11772/j.issn.1001-9081.2020121899

Abstract

Abstract: In the existing shipping monitoring image recognition model named Convolutional 3D (C3D), the intermediate representation learning ability is limited, the extraction of effective features is easily disturbed by noise, and the relationship between global features and local features is ignored in feature extraction. In order to solve these problems, a new shipping monitoring image recognition model based on attention mechanism network was proposed. The model was based on the Convolutional Neural Network (CNN) framework. Firstly, the shallow features of the image were extracted by the feature extractor. Then, the attention information was generated and the local discriminant features were extracted based on the different response strengths of the CNN to the active features of different regions. Finally, the multi-branch CNN structure was used to fuse the local discriminant features and the global texture features of the image, thus the interaction between the local discriminant features and the global texture features of the image was utilized to improve the learning ability of CNN to the intermediate representations. Experimental results show that, the recognition accuracy of the proposed model is 91.8% on the shipping image dataset, which is improved by 7.2 percentage points and 0.6 percentage points compared with the current C3D model and Discriminant Filter within a Convolutional Neural Network (DFL-CNN) model respectively. It can be seen that the proposed model can accurately judge the state of the ship, and can be effectively applied to the shipping monitoring project.

Key words: intelligent monitoring, deep learning, Convolutional Neural Network (CNN), image recognition, attention mechanism

摘要： 针对已有的航运监控图像识别模型C3D里中级表征学习能力有限，有效特征的提取容易受到噪声的干扰，且特征的提取忽视了整体特征与局部特征之间关系的问题，提出了一种新的基于注意力机制网络的航运监控图像识别模型。该模型基于卷积神经网络（CNN）框架，首先，通过特征提取器提取图像的浅层次特征；然后，基于CNN对不同区域激活特征的不同响应强度，生成注意力信息并实现对局部判别性特征的提取；最后，使用多分支的CNN结构融合局部判别性特征和图像全局纹理特征，从而利用局部判别性特征和图像全局纹理特征的交互关系提升CNN学习中级表征的能力。实验结果表明，所提出的模型在航运图像数据集上的识别准确率达到91.8%，相较于目前的C3D模型提高了7.2个百分点，相较于判别滤波器组卷积神经网络（DFL-CNN）模型提高了0.6个百分点。可见所提模型能够准确判断船舶的状态，可以有效应用于航运监控项目。

关键词: 智能监控, 深度学习, 卷积神经网络, 图像识别, 注意力机制

CLC Number:

TP391.4

ZHANG Kaiyue, ZHANG Hong. Shipping monitoring image recognition model based on attention mechanism network[J]. Journal of Computer Applications, 2021, 41(10): 3010-3016.

张凯悦, 张鸿. 基于注意力机制网络的航运监控图像识别模型[J]. 计算机应用, 2021, 41(10): 3010-3016.

References

[1] TRAN D, BOURDEV L, FERGUS R, et al. Learning spatiotemporal features with 3D convolutional networks[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway:IEEE, 2015:4489-4497.
[2] ZHANG K, LIU N, YUAN X F, et al. Fine-grained age estimation in the wild with attention LSTM networks[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2020, 30(9):3140-3152.
[3] YANG Z, LUO T G, WANG D, et al. Learning to navigate for finegrained classification[C]//Proceedings of the 2018 European Conference on Computer Vision, LNCS 11218. Cham:Springer, 2018:438-454.
[4] BERG T, BELHUMEUR P N. POOF:part-based one-vs.-one features for fine-grained categorization, face verification, and attribute estimation[C]//Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2013:955-962.
[5] XIE L X, TIAN Q, HONG R C, et al. Hierarchical part matching for fine-grained visual categorization[C]//Proceedings of the 2013 IEEE International Conference on Computer Vision. Piscataway:IEEE, 2013:1641-1648.
[6] ZHANG N, DONAHUE J, GIRSHICK R, et al. Part-based R-CNNs for fine-grained category detection[C]//Proceedings of the 2014 European Conference on Computer Vision, LNCS 8689. Cham:Springer, 2014:834-849.
[7] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2014:580-587.
[8] LIN D, SHEN X Y, LU C W, et al. Deep LAC:deep localization, alignment and classification for fine-grained recognition[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2015:1666-1674.
[9] LIN T Y, ROYCHOWDHURY A, MAJI S. Bilinear CNN models for fine-grained visual recognition[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway:IEEE, 2015:1449-1457.
[10] LIN T Y, MAJI S. Improved bilinear pooling with CNNs[C]//Proceedings of the 2017 British Machine Vision Conference. Durham:BMVA Press, 2017:No. 117.
[11] LI P H, XIE J T, WANG Q L, et al. Towards faster training of global covariance pooling networks by iterative matrix square root normalization[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2018:947-955.
[12] JADERBERG M, SIMONYAN K, ZISSERMAN A, et al. Spatial transformer networks[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems. Cambridge:MIT Press, 2015:2017-2025.
[13] FU J L, ZHENG H L, MEI T. Look closer to see better:recurrent attention convolutional neural network for fine-grained image recognition[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2017:4476-4484.
[14] ZHENG H L, FU J L, MEI T, et al. Learning multi-attention convolutional neural network for fine-grained image recognition[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway:IEEE, 2017:5219-5227.
[15] ZEILER M D, FERGUS R. Visualizing and understanding convolutional networks[C]//Proceedings of the 2014 European Conference on Computer Vision, LNCS 8689. Cham:Springer, 2014:818-833.
[16] ZHOU B L, KHOSLA A, LAPEDRIZA A, et al. Learning deep features for discriminative localization[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2016:2921-2929.
[17] LEE C Y, XIE S N, GALLAGHER P, et al. Deeply-supervised nets[C]//Proceedings of the 18th International Conference on Artificial Intelligence and Statistics. New York:JMLR. org, 2015:562-570.
[18] JIANG Z L, WANG Y M, DAVIS L, et al. Learning discriminative features via label consistent neural network[C]//Proceedings of the 2017 IEEE Winter Conference on Applications of Computer Vision. Piscataway:IEEE, 2017:207-216.
[19] JIN X J, CHEN Y P, DONG J, et al. Collaborative layer-wise discriminative learning in deep neural networks[C]//Proceedings of the 2016 European Conference on Computer Vision, LNCS 9911. Cham:Springer, 2016:733-749.
[20] LIU W, ANGUELOV D, ERHAN D, et al. SSD:single shot multiBox detector[C]//Proceedings of the 2016 European Conference on Computer Vision, LNCS 9905. Cham:Springer, 2016:21-37.
[21] ZHANG X P, XIONG H K, ZHOU W G, et al. Picking deep filter responses for fine-grained image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2016:1134-1142.
[22] WANG Y M, MORARIU V I, DAVIS L S. Learning a discriminative filter bank within a CNN for fine-grained recognition[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2018:4148-4157.
[23] GATYS L A, ECKER A S, BETHGE M. Image style transfer using convolutional neural networks[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2016:2414-2423.
[24] HU T, QI H G, HUANG Q M, et al. See better before looking closer:weakly supervised data augmentation network for finegrained visual classification[EB/OL]. (2019-03-23)[2020-03-23]. https://arxiv.org/pdf/1901.09891.pdf.
[25] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. (2015-04-10)[2020-03-20]. https://arxiv.org/pdf/1409.1556.pdf.
[26] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2016:770-778.
[27] 张雪芹, 余丽君. 基于判别关键域和深度学习的植物图像分类[J]. 计算机工程与设计, 2020, 41(3):742-748.(ZHANG X Q, YU L J. Classification of plant images based on discriminating key domains and deep learning[J]. Computer Engineering and Design, 2020, 41(3):742-748.)
[28] 姜代红, 张三友, 刘其开. 基于特征重标定生成对抗网络的图像分类算法[J]. 计算机应用研究, 2020, 37(3):932-935. (JIANG D H, ZHANG S Y, LIU Q K. Image classification algorithm based on feature recalibration GAN[J]. Application Research of Computers, 2020, 37(3):932-935.)
[29] 孙敏, 李旸, 庄正飞, 等. 基于并行混合网络融入注意力机制的情感分析[J]. 计算机应用, 2020, 40(9):2543-2548.(SUN M, LI Y, ZHUANG Z F, et al. Sentiment analysis based on parallel hybrid network and attention mechanism[J]. Journal of Computer Applications, 2020, 40(9):2543-2548.)
[30] 边小勇, 江沛龄, 赵敏, 等. 基于多分支神经网络模型的弱监督细粒度图像分类方法[J]. 计算机应用, 2020, 40(5):1295-1300.(BIAN X Y, JIANG P L, ZHAO M, et al. Multi-branch neural network model based weakly supervised fine-grained image classification method[J]. Journal of Computer Applications, 2020, 40(5):1295-1300.)