基于多空间混合注意力的图像描述生成方法

doi:10.11772/j.issn.1001-9081.2019091569

计算机应用 ›› 2020, Vol. 40 ›› Issue (4): 985-989.DOI: 10.11772/j.issn.1001-9081.2019091569

基于多空间混合注意力的图像描述生成方法

林贤早, 刘俊, 田胜, 徐小康, 姜涛

杭州电子科技大学通信信息传输与融合技术国防重点学科实验室, 杭州 310018

收稿日期:2019-09-16 修回日期:2019-10-28 出版日期:2020-04-10 发布日期:2019-11-04
通讯作者: 林贤早
作者简介:林贤早(1994-),男,浙江温州人,硕士研究生,主要研究方向:自然语言处理、计算机视觉、强化学习;刘俊(1971-),男,贵州安顺人,教授,博士,主要研究方向:模式识别、智能系统、目标检测、信息融合;田胜(1994-),男,安徽铜陵人,硕士研究生,主要研究方向:目标检测、目标跟踪;徐小康(1996-),男,安徽滁州人,硕士研究生,主要研究方向:深度学习、目标检测;姜涛(1995-),男,江苏常州人,硕士研究生,主要研究方向:目标检测、信息融合。
基金资助:
国家自然科学基金资助项目（61673146）；国家自然科学基金重大仪器专项（61427808）；浙江省重点研发计划项目（2019C05005）。

Image description generation method based on multi-spatial mixed attention

LIN Xianzao, LIU Jun, TIAN Sheng, XU Xiaokang, JIANG Tao

Fundamental Science on Communication Information Transmission and Fusion Technology Laboratory, Hangzhou Dianzi University, Hangzhou Zhejiang 310018, China

Received:2019-09-16 Revised:2019-10-28 Online:2020-04-10 Published:2019-11-04
Supported by:
This work is partially supported by the National Natural Science Foundation of China (61673146), the National Natural Science Foundation of China Major Instrument Project (61427808),the Key Research and Development Project of Zhejiang Province(2019C05005).

摘要/Abstract

摘要： 针对近海船舶监测系统中自动化情报生成的空缺，为了构建智能化船舶监测系统，提出基于多空间混合注意力的图像描述生成方法，对近海船舶图像进行描述。图像描述生成方法就是让计算机通过符合语言学的文字描述出图像中的内容。首先使用图像的感兴趣区域的编码特征预训练出多空间混合注意力模型，然后加入策略梯度改造损失函数对预训练好的解码模型继续进行微调，得到最终的模型。在MSCOCO（MicroSoft Common Objects in COntext）图像描述数据集上的实验结果表明，所提模型较以往的注意力模型提升了图像描述生成的评价指标，比如CIDEr分数。使用该模型在自建船舶描述数据集中能够自动描述出船舶图像的主要内容，说明所提方法能为自动化情报生成提供数据支持。

关键词: 图像描述, 深度学习, 注意力机制, 情报生成, 多空间混合注意力

Abstract: Concerning the vacancy of automatic information generation in offshore ship monitoring system,and aiming to build an intelligent ship monitoring system,an image description generation method based on multi-spatial mixed attention was proposed to describe the offshore ship images. The image description generation task is designed to let the computer describe the content of the image with words satisfying linguistics. Firstly,the multi-spatial mixed attention model was trained by the encoding features of the region of interest on the image,then the pretrained decoding model was fine-tuned by reconstructing the loss function with gradient policy,and the final model was obtained. Experimental results on MSCOCO (MicroSoft Common Objects in COntext)image description dataset show that the proposed model is better than the previous attention model on the evaluation index of image description generation,such as CIDEr score. The main content of ship image can be automatically described by the model on the self-constructed ship description dataset,demonstrating that the method can provide the data support for automatic information generation.

Key words: image description, deep learning, attention mechanism, information generation, multi-spatial mixed attention

中图分类号:

TP389.1

林贤早, 刘俊, 田胜, 徐小康, 姜涛. 基于多空间混合注意力的图像描述生成方法[J]. 计算机应用, 2020, 40(4): 985-989.

LIN Xianzao, LIU Jun, TIAN Sheng, XU Xiaokang, JIANG Tao. Image description generation method based on multi-spatial mixed attention[J]. Journal of Computer Applications, 2020, 40(4): 985-989.

参考文献

[1] 奚雪峰, 周国栋. 面向自然语言处理的深度学习研究[J]. 自动化学报,2016,42(10):1445-1465. (XI X F,ZHOU G D. A survey on deep learning for natural language processing[J]. Acta Automatica Sinica,2016,42(10):1445-1465.)
[2] 周飞燕, 金林鹏, 董军. 卷积神经网络研究综述[J]. 计算机学报,2017,40(6):1229-1251. (ZHOU F Y,JIN L P,DONG J. Review of convolutional neural network[J]. Chinese Journal of Computers,2017,40(6):1229-1251.)
[3] FARHADI A,HEJRATI M,SADEGHI M A,et al. Every picture tells a story:generating sentences from images[C]//Proceedings of the 2010 European Conference on Computer Vision,LNCS 6314. Berlin:Springer,2010:15-29.
[4] YANG Y,TEO C,DAUMÉH III,et al. Corpus-guided sentence generation of natural images[C]//Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA:Association for Computational Linguistics:Association for Computational Linguistics,2011:444-454.
[5] VINYALS O,TOSHEV A,BENGIO S,et al. Show and tell:a neural image caption generator[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2015:3156-3164.
[6] KALCHBRENNER N, BLUNSOM P. Recurrent continuous translation models[C]//Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA:Association for Computational Linguistics,2013:1700-1709.
[7] XU K,BA J L,KIROS R,et al. Show,attend and tell:neural image caption generation with visual attention[C]//Proceedings of the 32nd International Conference on Machine Learning. New York:JMLR. org,2015:2048-2057.
[8] YOU Q,JIN H,WANG Z,et al. Image captioning with semantic attention[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2016:4651-4659.
[9] CHEN L,ZHANG H,XIAO J,et al. SCA-CNN:spatial and channel-wise attention in convolutional networks for image captioning[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2017:6298-6306.
[10] PAPINENI K,ROUKOS S,WARD T,et al. BLEU:a method for automatic evaluation of machine translation[C]//Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Stroudsburg, PA:Association for Computational Linguistics,2002:311-318.
[11] LIN C Y. Rouge:a package for automatic evaluation of summaries[M]//Text Summarization Branches Out. Stroudsburg, PA:Association for Computational Linguistics,2004:74-81.
[12] VEDANTAM R,ZITNICK C L,PARIK D. CIDEr:consensus-based image description evaluation[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2015:4566-4575.
[13] RENNIE S J,MARCHERET E,MROUEH Y,et al. Self-critical sequence training for image captioning[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2017:1179-1195.
[14] HE K,ZHANG X,REN S,et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2016:770-778.
[15] REN S,HE K,GIRSHICK R,et al. Faster R-CNN:towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2017, 39(6):1137-1149.
[16] KARPATHY A, LI F. Deep visual-semantic alignments for generating image descriptions[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(4):664-676.
[17] LU J,XIONG C,PARIKH D,et al. Knowing when to look:adaptive attention via a visual sentinel for image captioning[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2017:3242-3250.
[18] YAO T,PAN Y,LI Y,et al. Boosting image captioning with attributes[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway:IEEE,2017:4904-4912.
[19] ANDERSON P,HE X,BUEHLER C,et al. Bottom-up and top-down attention for image captioning and visual question answering[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2018:6077-6086.

基于多空间混合注意力的图像描述生成方法

Image description generation method based on multi-spatial mixed attention

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	陈成瑞, 孙宁, 何世彪, 廖勇. 面向C-V2X通信的基于深度学习的联合信道估计与均衡算法[J]. 计算机应用, 2021, 41(9): 2687-2693.
[2]	李康康, 张静. 基于注意力机制的多层次编码和解码的图像描述模型[J]. 计算机应用, 2021, 41(9): 2504-2509.
[3]	赵宏, 孔东一. 图像特征注意力与自适应注意力融合的图像内容中文描述[J]. 计算机应用, 2021, 41(9): 2496-2503.
[4]	徐江浪, 李林燕, 万新军, 胡伏原. 结合目标检测的室内场景识别方法[J]. 计算机应用, 2021, 41(9): 2720-2725.
[5]	谢德峰, 吉建民. 融入句法感知表示进行句法增强的语义解析[J]. 计算机应用, 2021, 41(9): 2489-2495.
[6]	代雨柔, 杨庆, 张凤荔, 周帆. 基于自监督学习的社交网络用户轨迹预测模型[J]. 计算机应用, 2021, 41(9): 2545-2551.
[7]	刘雅璇, 钟勇. 基于头实体注意力的实体关系联合抽取方法[J]. 计算机应用, 2021, 41(9): 2517-2522.
[8]	郑志强, 胡鑫, 翁智, 王雨禾, 程曦. 基于改进DenseNet的牛眼图像特征提取方法[J]. 计算机应用, 2021, 41(9): 2780-2784.
[9]	何正海, 线岩团, 王蒙, 余正涛. 融合句法指导与字符注意力机制的案情阅读理解方法[J]. 计算机应用, 2021, 41(8): 2427-2431.
[10]	曹玉红, 徐海, 刘荪傲, 王紫霄, 李宏亮. 基于深度学习的医学影像分割研究综述[J]. 计算机应用, 2021, 41(8): 2273-2287.
[11]	秦斌斌, 彭良康, 卢向明, 钱江波. 司机分心驾驶检测研究进展[J]. 计算机应用, 2021, 41(8): 2330-2337.
[12]	党伟超, 李涛, 白尚旺, 高改梅, 刘春霞. 基于自注意力长短期记忆网络的Web软件系统实时剩余寿命预测方法[J]. 计算机应用, 2021, 41(8): 2346-2351.
[13]	高钦泉, 黄炳城, 刘文哲, 童同. 基于改进CenterNet的竹条表面缺陷检测方法[J]. 计算机应用, 2021, 41(7): 1933-1938.
[14]	侯笑晗, 金国栋, 谭力宁, 薛远亮. 基于自适应和最优特征的合成孔径雷达舰船检测方法[J]. 计算机应用, 2021, 41(7): 2150-2155.
[15]	武维, 李泽平, 杨华蔚, 林川, 王忠德. 融合内容特征和时序信息的深度注意力视频流行度预测模型[J]. 计算机应用, 2021, 41(7): 1878-1884.