Journal of Computer Applications ›› 2021, Vol. 41 ›› Issue (5): 1262-1267.DOI: 10.11772/j.issn.1001-9081.2020071078

Special Issue: 人工智能

• Artificial intelligence • Previous Articles     Next Articles

Image description generation algorithm based on improved attention mechanism

LI Wenhui, ZENG Shangyou, WANG Jinjin   

  1. School of Electronic Engineering, Guangxi Normal University, Guilin Guangxi 541004, China
  • Received:2020-07-23 Revised:2020-10-06 Online:2021-05-10 Published:2020-11-12
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (11465004).

基于改进注意力机制的图像描述生成算法

李文惠, 曾上游, 王金金   

  1. 广西师范大学 电子工程学院, 广西 桂林 541004
  • 通讯作者: 曾上游
  • 作者简介:李文惠(1997-),女,湖南衡阳人,硕士研究生,主要研究方向:深度学习、人工智能;曾上游(1974-),男,湖南双峰人,教授,博士,主要研究方向:神经网络、人工智能、复杂网络、生物信息处理和生物芯片;王金金(1995-),女,安徽安庆人,硕士研究生,主要研究方向:深度学习、人工智能。
  • 基金资助:
    国家自然科学基金资助项目(11465004)。

Abstract: Image description is to express the global information contained in the image in sentences. It requires that the image description generation model can extract image information and express the extracted image information in sentences. The traditional model is based on Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN), which can realize the function of image-to-sentence translation to a certain extent. However, this model has low accuracy and training speed when extracting key information of the image. To solve this problem, an improved attention mechanism image description generation model based on CNN and Long Short-Term Memory (LSTM) network was proposed. VGG19 and ResNet101 were used as the feature extraction networks, and group convolution was introduced into the attention mechanism to replace the traditional fully connected operation, so as to improve the evaluation indices.The model was trained by public datasets Flickr8K and Flickr30K and validated by various evaluation indices (BLEU(Bilingual Evaluation Understudy), ROUGE_L(Recall-Oriented Understudy for Gisting Evaluation), CIDEr(Consensus-based Image Description Evaluation), METEOR(Metric for Evaluation of Translation with Explicit Ordering)). Experimental results show that compared with the model with traditional attention mechanism, the proposed improved image description generation model with attention mechanism improves the accuracy of the image description task, and this model is better than the traditional model on all the four evaluation indices.

Key words: image description, natural language processing, Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM) network, attention mechanism

摘要: 图像描述是将图像所包含的全局信息用语句来表示。它要求图像描述生成模型既能提取出图像信息,又能将提取出来的图像信息用语句表达出来。传统的模型是基于卷积神经网络(CNN)和循环神经网络(RNN)搭建的,在一定程度上可以实现图像转语句的功能,但该模型在提取图像关键信息时精度不高且训练速度缓慢。针对这一问题,提出了一种基于CNN和长短期记忆(LSTM)网络改进的注意力机制图像描述生成模型。采用VGG19和ResNet101作为特征提取网络,在注意力机制中引入分组卷积替代传统的全连接操作,从而提高评价值指标。使用了公共数据集Flickr8K、Flickr30K对该模型进行训练,采用多种评价指标(BLEU、ROUGE_L、CIDEr、METEOR)对模型进行验证。实验结果表明,与引入传统的注意力机制模型相比,提出的改进注意力机制图像描述生成模型对图像描述任务的准确性有所提升,并且该模型在5种评价指标上均优于传统的模型。

关键词: 图像描述, 自然语言处理, 卷积神经网络, 长短期记忆网络, 注意力机制

CLC Number: