[1] 奚雪峰, 周国栋. 面向自然语言处理的深度学习研究[J]. 自动化学报,2016,42(10):1445-1465. (XI X F,ZHOU G D. A survey on deep learning for natural language processing[J]. Acta Automatica Sinica,2016,42(10):1445-1465.) [2] 周飞燕, 金林鹏, 董军. 卷积神经网络研究综述[J]. 计算机学报,2017,40(6):1229-1251. (ZHOU F Y,JIN L P,DONG J. Review of convolutional neural network[J]. Chinese Journal of Computers,2017,40(6):1229-1251.) [3] FARHADI A,HEJRATI M,SADEGHI M A,et al. Every picture tells a story:generating sentences from images[C]//Proceedings of the 2010 European Conference on Computer Vision,LNCS 6314. Berlin:Springer,2010:15-29. [4] YANG Y,TEO C,DAUMÉH III,et al. Corpus-guided sentence generation of natural images[C]//Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA:Association for Computational Linguistics:Association for Computational Linguistics,2011:444-454. [5] VINYALS O,TOSHEV A,BENGIO S,et al. Show and tell:a neural image caption generator[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2015:3156-3164. [6] KALCHBRENNER N, BLUNSOM P. Recurrent continuous translation models[C]//Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA:Association for Computational Linguistics,2013:1700-1709. [7] XU K,BA J L,KIROS R,et al. Show,attend and tell:neural image caption generation with visual attention[C]//Proceedings of the 32nd International Conference on Machine Learning. New York:JMLR. org,2015:2048-2057. [8] YOU Q,JIN H,WANG Z,et al. Image captioning with semantic attention[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2016:4651-4659. [9] CHEN L,ZHANG H,XIAO J,et al. SCA-CNN:spatial and channel-wise attention in convolutional networks for image captioning[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2017:6298-6306. [10] PAPINENI K,ROUKOS S,WARD T,et al. BLEU:a method for automatic evaluation of machine translation[C]//Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Stroudsburg, PA:Association for Computational Linguistics,2002:311-318. [11] LIN C Y. Rouge:a package for automatic evaluation of summaries[M]//Text Summarization Branches Out. Stroudsburg, PA:Association for Computational Linguistics,2004:74-81. [12] VEDANTAM R,ZITNICK C L,PARIK D. CIDEr:consensus-based image description evaluation[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2015:4566-4575. [13] RENNIE S J,MARCHERET E,MROUEH Y,et al. Self-critical sequence training for image captioning[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2017:1179-1195. [14] HE K,ZHANG X,REN S,et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2016:770-778. [15] REN S,HE K,GIRSHICK R,et al. Faster R-CNN:towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2017, 39(6):1137-1149. [16] KARPATHY A, LI F. Deep visual-semantic alignments for generating image descriptions[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(4):664-676. [17] LU J,XIONG C,PARIKH D,et al. Knowing when to look:adaptive attention via a visual sentinel for image captioning[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2017:3242-3250. [18] YAO T,PAN Y,LI Y,et al. Boosting image captioning with attributes[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway:IEEE,2017:4904-4912. [19] ANDERSON P,HE X,BUEHLER C,et al. Bottom-up and top-down attention for image captioning and visual question answering[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2018:6077-6086. |