[1] 陈龙杰,张钰,张玉梅,等. 基于多注意力多尺度特征融合的图像描述生成算法[J]. 计算机应用, 2019, 39(2):354-359. (CHEN L J, ZHANG Y, ZHANG Y M, et al. Image caption algorithm based on multi-attention and multi-scale feature fusion[J]. Journal of Computer Applications, 2019, 39(2):354-359.) [2] VINYALS O, TOSHEV A, BENGIO S, et al. Show and tell:a neural image caption generator[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2015:3156-3164. [3] XU K, BA J, KIROS R, et al. Show, attend and tell:neural image caption generation with visual attention[C]//Proceedings of the 2015 International Conference on Machine Learning. New York:International Machine Learning Society, 2015:2048-2057. [4] 汤鹏杰,谭云兰,李金忠. 融合图像场景及物体先验知识的图像描述生成模型[J]. 中国图象图形学报, 2017, 22(9):1251-1260. (TANG P J, TAN Y L, LI J Z. Image description based on the fusion of scene and object category prior knowledge[J]. Journal of Image and Graphics, 2017, 22(9):1251-1260) [5] 杨楠,南琳,张丁一,等. 基于深度学习的图像描述研究[J]. 红外与激光工程, 2018, 47(2):9-16. (YANG N, NAN L, ZHANG D Y, et al. Research on image interpretation based on deep learning[J]. Infrared and Laser Engineering, 2018, 47(2):9-16.) [6] SZEGEDY C, IOFFE S, VANHOUCKE V, et al. Inception-v4, inception-ResNet and the impact of residual connections on learning[C]//Proceedings of the 31st AAAI Conference on Artificial Intelligence. Pola Alto, CA:AAAI Press, 2017:4278-4284. [7] LEE J, SEO S, CHOI Y S. Semantic relation classification via bidirectional LSTM networks with entity-aware attention using latent entity typing[J]. Symmetry, 2019, 11(6):No.785. [8] LIU Y, LIU Z, CHUA T S, et al. Topical word embeddings[C]//Proceedings of the 29th AAAI Conference on Artificial Intelligence. Pola Alto, CA:AAAI Press, 2015:2418-2424. [9] 杨丽,吴雨茜,王俊丽,等. 循环神经网络研究综述[J]. 计算机应用, 2018, 38(S2):1-6, 26. (YANG L, WU Y X, WANG J L, et al. Research on recurrent neural network[J]. Journal of Computer Applications, 2018, 38(S2):1-6, 26) [10] VEDANTAM R, ZITNICK C L, PARIKH D. CIDEr:consensus-based image description evaluation[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2015:4566-4575. [11] KARPATHY A, LI F F. Deep visual-semantic alignments for generating image descriptions[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(4):664-676. [12] DONAHUE J, HENDRICKS L A, GUADARRAMA S, et al. Long-term recurrent convolutional networks for visual recognition and description[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2015:2625-2634. [13] KARPATHY A, LI F F. Deep visual-semantic alignments for generating image descriptions[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(4):664-676. [14] 汤鹏杰,王瀚漓,许恺晟. LSTM逐层多目标优化及多层概率融合的图像描述[J]. 自动化学报, 2018, 44(7):1237-1249. (TANG P J, WANG H L, XU K S. Multi-objective layer-wise optimization and multi-level probability fusion for image description generation using LSTM[J]. Acta Automatica Sinica, 2018, 44(7):1237-1249.) |