[1] VINYALS O,TOSHEV A,BENGIO S,et al. Show and tell:a neural image caption generator[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2015:3156-3164. [2] 李文惠, 曾上游, 王金金. 基于改进注意力机制的图像描述生成算法[J]. 计算机应用,2021,41(5):1262-1267.(LI W H, ZENG S Y,WANG J J. Image description generation algorithm based on improved attention mechanism[J]. Journal of Computer Applications,2021,41(5):1262-1267.) [3] SIMONYAN K,ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. (2015-04-10)[2021-01-20]. https://arxiv.org/pdf/1409.1556.pdf. [4] HE K M,ZHANG X Y,REN S Q,et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2016:770-778. [5] HOCHREITER S,SCHMIDHUBER J. Long short-term memory[J]. Neural Computation,1997,9(8):1735-1780. [6] BAHDANAU D,CHO K,BENGIO Y. Neural machine translation by jointly learning to align and translate[EB/OL]. (2016-05-19)[2019-09-01]. https://arxiv.org/pdf/1409.0473.pdf. [7] XU K,BA J,KIROS R,et al. Show,attend and tell:neural image caption generation with visual attention[C]//Proceedings of the 32nd International Conference on Machine Learning. New York:JMLR. org,2015:2048-2057. [8] CHEN L,ZHANG H W,XIAO J,et al. SCA-CNN:spatial and channel-wise attention in convolutional networks for image captioning[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2017:6298-6306. [9] XIAO X Y, WANG L F, DING K, et al. Dense semantic embedding network for image captioning[J]. Pattern Recognition, 2019,90:285-296. [10] ZHANG M X,YANG Y,ZHANG H W,et al. More is better:precise and detailed image captioning using online positive recall and missing concepts mining[J]. IEEE Transactions on Image Processing,2019,28(1):32-44. [11] REN S Q,HE K M,GIRSHICK R,et al. Faster R-CNN:towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2017, 39(6):1137-1149. [12] LU J S,XIONG C M,PARIKH D,et al. Knowing when to look:adaptive attention via a visual sentinel for image captioning[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2017:3242-3250. [13] 赵小虎, 李晓. 基于多特征提取的图像语义描述算法[J]. 计算机应用,2021,41(6):1640-1646.(ZHAO X H,LI X. Image captioning algorithm based on multi-feature extraction[J]. Journal of Computer Applications,2021,41(6):1640-1646.) [14] XIAO X Y,WANG L F,DING K,et al. Deep hierarchical encoder-decoder network for image captioning[J]. IEEE Transactions on Multimedia,2019,21(11):2942-2956. [15] VASWANI A,SHAZEER N,PARMAR N,et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook,NY:Curran Associates Inc.,2017:6000-6010. [16] LIN T Y,DOLLÁR P,GIRSHICK R,et al. Feature pyramid networks for object detection[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2017:936-944. [17] ZHANG Z J, WU Q, WANG Y, et al. High-quality image captioning with fine-grained and semantic-guided visual attention[J]. IEEE Transactions on Multimedia,2019,21(7):1681-1693. [18] YAO T,PAN Y W,LI Y H,et al. Boosting image captioning with attributes[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway:IEEE,2017:4904-4912. [19] WU Q,SHEN C H,WANG P,et al. Image captioning and visual question answering based on attributes and external knowledge[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2018,40(6):1367-1381. [20] YU N G,HU X L,SONG B H,et al. Topic-oriented image captioning based on order-embedding[J]. IEEE Transactions on Image Processing,2019,28(6):2743-2754. [21] ANDERSON P,HE X D,BUEHLER C,et al. Bottom-up and topdown attention for image captioning and visual question answering[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2018:6077-6086. [22] WU L X,XU M,WANG J Q,et al. Recall what you see continually using GridLSTM in image captioning[J]. IEEE Transactions on Multimedia,2020,22(3):808-818. [23] LIN T Y,MAIRE M,BELONGIE S,et al. Microsoft COCO:common objects in context[C]//Proceedings of the 2014 European Conference on Computer Vision,LNCS 8693. Cham:Springer, 2014:740-755. [24] PAPINENI K,ROUKOS S,WARD T,et al. BLEU:a method for automatic evaluation of machine translation[C]//Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA:Association for Computational Linguistics,2002:311-318. [25] BANERJEE S,LAVIE A. METEOR:an automatic metric for MT evaluation with improved correlation with human judgments[C]//Proceedings of the 2005 ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization. Stroudsburg,PA:Association for Computational Linguistics,2005:65-72. [26] LIN C Y. ROUGE:a package for automatic evaluation of summaries[C]//Proceedings of the ACL 2004 Workshop on Text Summarization. Stroudsburg,PA:Association for Computational Linguistics,2004:74-81. [27] VEDANTAM R,ZITNICK C L,PARIKH D. CIDEr:consensusbased image description evaluation[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2015:4566-4575. [28] KINGMA D P, BA J M. ADAM:a method for stochastic optimization[EB/OL]. (2017-01-30)[2020-04-22]. https://arxiv.org/pdf/1412.6980.pdf. [29] 韦人予, 蒙祖强. 基于注意力特征自适应校正的图像描述模型[J]. 计算机应用,2020,40(S1):45-50.(WEI R Y,MENG Z Q. Image caption model based on attention feature adaptive recalibration[J]. Journal of Computer Applications,2020,40(S1):45-50.) [30] WU C L,YUAN S Z,CAO H W,et al. Hierarchical attentionbased fusion for image caption with multi-grained rewards[J]. IEEE Access,2020,8:57943-57951. |