计算机应用 ›› 2020, Vol. 40 ›› Issue (4): 985-989.DOI: 10.11772/j.issn.1001-9081.2019091569

• 人工智能 • 上一篇    下一篇

基于多空间混合注意力的图像描述生成方法

林贤早, 刘俊, 田胜, 徐小康, 姜涛   

  1. 杭州电子科技大学 通信信息传输与融合技术国防重点学科实验室, 杭州 310018
  • 收稿日期:2019-09-16 修回日期:2019-10-28 出版日期:2020-04-10 发布日期:2019-11-04
  • 通讯作者: 林贤早
  • 作者简介:林贤早(1994-),男,浙江温州人,硕士研究生,主要研究方向:自然语言处理、计算机视觉、强化学习;刘俊(1971-),男,贵州安顺人,教授,博士,主要研究方向:模式识别、智能系统、目标检测、信息融合;田胜(1994-),男,安徽铜陵人,硕士研究生,主要研究方向:目标检测、目标跟踪;徐小康(1996-),男,安徽滁州人,硕士研究生,主要研究方向:深度学习、目标检测;姜涛(1995-),男,江苏常州人,硕士研究生,主要研究方向:目标检测、信息融合。
  • 基金资助:
    国家自然科学基金资助项目(61673146);国家自然科学基金重大仪器专项(61427808);浙江省重点研发计划项目(2019C05005)。

Image description generation method based on multi-spatial mixed attention

LIN Xianzao, LIU Jun, TIAN Sheng, XU Xiaokang, JIANG Tao   

  1. Fundamental Science on Communication Information Transmission and Fusion Technology Laboratory, Hangzhou Dianzi University, Hangzhou Zhejiang 310018, China
  • Received:2019-09-16 Revised:2019-10-28 Online:2020-04-10 Published:2019-11-04
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61673146), the National Natural Science Foundation of China Major Instrument Project (61427808),the Key Research and Development Project of Zhejiang Province(2019C05005).

摘要: 针对近海船舶监测系统中自动化情报生成的空缺,为了构建智能化船舶监测系统,提出基于多空间混合注意力的图像描述生成方法,对近海船舶图像进行描述。图像描述生成方法就是让计算机通过符合语言学的文字描述出图像中的内容。首先使用图像的感兴趣区域的编码特征预训练出多空间混合注意力模型,然后加入策略梯度改造损失函数对预训练好的解码模型继续进行微调,得到最终的模型。在MSCOCO(MicroSoft Common Objects in COntext)图像描述数据集上的实验结果表明,所提模型较以往的注意力模型提升了图像描述生成的评价指标,比如CIDEr分数。使用该模型在自建船舶描述数据集中能够自动描述出船舶图像的主要内容,说明所提方法能为自动化情报生成提供数据支持。

关键词: 图像描述, 深度学习, 注意力机制, 情报生成, 多空间混合注意力

Abstract: Concerning the vacancy of automatic information generation in offshore ship monitoring system,and aiming to build an intelligent ship monitoring system,an image description generation method based on multi-spatial mixed attention was proposed to describe the offshore ship images. The image description generation task is designed to let the computer describe the content of the image with words satisfying linguistics. Firstly,the multi-spatial mixed attention model was trained by the encoding features of the region of interest on the image,then the pretrained decoding model was fine-tuned by reconstructing the loss function with gradient policy,and the final model was obtained. Experimental results on MSCOCO (MicroSoft Common Objects in COntext)image description dataset show that the proposed model is better than the previous attention model on the evaluation index of image description generation,such as CIDEr score. The main content of ship image can be automatically described by the model on the self-constructed ship description dataset,demonstrating that the method can provide the data support for automatic information generation.

Key words: image description, deep learning, attention mechanism, information generation, multi-spatial mixed attention

中图分类号: