[1] VINYALS O,TOSHEV A,BENGIO S,et al. Show and tell:a neural image caption generator[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2015:3156-3164. [2] HE K,ZHANG X,REN S,et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2016:770-778. [3] SIMONYAN K,ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL].[2020-04-21]. https://arxiv.org/pdf/1409.1556.pdf. [4] WEI Y,XIA W,LIN M,et al. HCP:a flexible CNN framework for multi-label image classification[J]. IEEE Transactions on Software Engineering,2016,38(9):1901-1907. [5] SOCHER R, KARPATHY A, LE Q V, et al. Grounded compositional semantics for finding and describing images with sentences[J]. Transactions of the Association for Computational Linguistics,2014,2:207-218. [6] CHEN X, ZITNICK C L. Mind's eye:a recurrent visual representation for image caption generation[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2015:2422-2431. [7] GAO J,WANG S,WANG S,et al. Self-critical n-step training for image captioning[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2019:6293-6301. [8] LU J,XIONG C,PARIKH D,et al. Knowing when to look:adaptive attention via a visual sentinel for image captioning[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2017:3242-3250. [9] CHEN L,ZHANG H,XIAO J,et al. SCA-CNN:spatial and channel-wise attention in convolutional networks for image captioning[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2017:6298-6306. [10] 陈龙杰, 张钰, 张玉梅, 等. 基于多注意力多尺度特征融合的图像描述生成算法[J]. 计算机应用,2019,39(2):354-359. (CHEN L J,ZHANG Y,ZHANG Y M,et al. Image caption algorithm based on multi-attention and multi-scale feature fusion[J]. Journal of Computer Applications,2019,39(2):354-359.) [11] XU K,BA J L,KIROS R,et al. Show,attend and tell:neural image caption generation with visual attention[C]//Proceedings of the 32nd International Conference on Machine Learning. New York:JMLR. org,2015:2048-2057. [12] HOCHREITER S,SCHMIDHUBER J. Long short-term memory[J]. Neural Computation,1997,9(8):1735-1780. [13] 黄友文, 游亚东, 赵朋. 融合卷积注意力机制的图像描述生成模型[J]. 计算机应用,2020,40(1):23-27.(HUANG Y W, YOU Y D, ZHAO P. Image caption generation model with convolutional attention mechanism[J]. Journal of Computer Applications,2020,40(1):23-27.) [14] 杨丽, 吴雨茜, 王俊丽, 等. 循环神经网络研究综述[J]. 计算机应用,2018,38(S2):1-6,26.(YANG L,WU Y X,WANG J L, et al. Research on recurrent neural network[J]. Journal of Computer Applications,2018,38(S2):1-6,26.) [15] PAPINENI K,ROUKOS S,WARD T,et al. BLEU:a method for automatic evaluation of machine translation[C]//Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA:Association for Computational Linguistics,2002:311-318. [16] VEDANTAM R,ZITNICK C L,PARIKH D. CIDEr:consensusbased image description evaluation[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2015:4566-4575. [17] LIN C Y. ROUGE:a package for automatic evaluation of summaries[C]//Proceedings of the 2004 ACL Workshop on Text Summarization Branches Out. Stroudsburg,PA:Association for Computational Linguistics,2004:74-81. [18] BANERJEE S,LAVIE A. METEOR:an automatic metric for mt evaluation with improved correlation with human judgments[C]//Proceedings of the 2005 ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization. Stroudsburg,PA:Association for Computational Linguistics,2005:65-72. [19] HODOSH M,YOUNG P,HOCKENMAIER J. Framing image description as a ranking task:data,models and evaluation metrics[J]. Journal of Artificial Intelligence Research, 2013, 47:853-899. [20] YOUNG P,LAI A,HODOSH M,et al. From image descriptions to visual denotations:new similarity metrics for semantic inference over event descriptions[J]. Transactions of the Association for Computational Linguistics,2014,2:67-78. [21] KARPATHY A,LI F F. Deep visual-semantic alignments for generating image descriptions[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2015:3128-3137. [22] 陶云松, 张丽红. 基于双向注意力机制图像描述方法研究[J]. 测试技术学报,2019,33(4):347-350,364.(TAO Y S, ZHANG L H. Research on image description method based on bidirectional attentional mechanism[J]. Journal of Test and Measurement Technology,2019,33(4):347-350,364.) |