基于注意力特征自适应校正的图像描述模型

doi:10.11772/j. issn.1001-9081.2019122170

计算机应用

• 人工智能与仿真 • 下一篇

基于注意力特征自适应校正的图像描述模型

韦人予¹,蒙祖强²

1. 广西大学计算机与电子信息学院，南宁 530004

收稿日期:2019-12-25 修回日期:2020-03-01 发布日期:2020-03-01 出版日期:2020-05-13
通讯作者: 蒙祖强

Image caption model based on attention feature adaptive recalibration

Received:2019-12-25 Revised:2020-03-01 Online:2020-03-01 Published:2020-05-13

摘要/Abstract

摘要： 深度学习框架下的图像描述模型存在对图像特征选择不准确、利用不充分的问题，导致生成的图像描述语句整体质量不高。为此，提出了一种基于注意力特征自适应校正的图像描述模型。应用卷积神经网络提取图像特征，融合注意力机制，能够在有序输出单词的同时动态聚焦在图像的各个区域，从而得到带有位置信息的注意力特征；通过一个通道激活层全面捕获通道之间依赖关系，进行注意力特征自适应校正，提高特征表示能力，进而提升由长短期记忆（LSTM）网络生成的图像描述语句质量。在MS COCO、Flickr8K、Flickr30K三个标准数据集上对模型进行对比实验，实验结果表明，所提的模型在 MS COCO 数据集上的 BLEU_1、BLEU_2、BLEU_3、BLEU_4、Meteor、CIDEr得分分别可达到69. 4%、52. 3%、38. 6%、28. 5%、23. 3%和83. 6%，优于传统神经网络图像描述模型，能够生成更准确的图像描述。

关键词: 图像描述, 深度学习, 注意力机制, 多模态, 自然语言处理

Abstract: For that the image caption model under deep learning framework has the problem of inaccurate selection and insufficient utilization of image features，which leads to the low quality of generated image caption statements，an image caption model based on attention feature adaptive recalibration was proposed. Firstly，the convolutional neural network was utilized to extract image features，and the attention mechanism was integrated in the image features to obtain the attention features with location information，so that the corresponding image regions were dynamically focused when the words were output in order. Then，a channel activation layer was constructed to fully capture channel-wise dependencies for attention feature adaptive recalibration，which boosted the representational power of the features，and ultimately improved the quality of generated sentences by Long Short-Term Memory（LSTM）network. A comparison experiment was conducted on the three standard data sets of MS COCO，Flickr8K and Flickr30K. The experiment results show that the scores of BLEU_1，BLEU_2， BLEU_3，BLEU_4，Meteor and CIDEr of the proposed model on MS COCO data set can achieve 69. 4%，52. 3%，38. 6%， 28. 5%，23. 3% and 83. 6%，which are superior to the traditional neural network image caption model. The proposed model can generate more accurate image caption.

Key words: image caption, deep learning, attention mechanism, multimodal, Natural Language Processing (NLP)

中图分类号:

TP391.41

韦人予蒙祖强. 基于注意力特征自适应校正的图像描述模型[J]. 计算机应用, DOI: 10.11772/j. issn.1001-9081.2019122170.

[1]	黄子杰, 欧阳, 江德港, 郭彩玲, 李柏林. 面向牵引座焊缝表面质量检测的轻量型深度学习算法[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 983-988.
[2]	孙滔, 段张甜, 朱浩楠, 郭沛豪, 孙鹤立. 基于新奇度量的社交事件推荐方法[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 760-766.
[3]	尚爱国, 朱欣娟. 基于多任务学习的意图检测和槽位填充联合方法[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 690-695.
[4]	郑宇亮, 陈云华, 白伟杰, 陈平华. 融合事件数据和图像帧的车辆目标检测[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 931-937.
[5]	董炜娜, 刘佳, 潘晓中, 陈立峰, 孙文权. 基于编码-解码网络的大容量鲁棒图像隐写方案[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 772-779.
[6]	赵奎, 仇慧琪, 李旭, 徐知非. 结合注意力和多路径融合的实时肺结节检测算法[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 945-952.
[7]	李雨秋, 侯利萍, 薛健, 吕科, 王泳. 基于内容解译的遥感图像推荐方法[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 722-731.
[8]	江锐, 刘威, 陈成, 卢涛. 非对称端到端的无监督图像去雨网络[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 922-930.
[9]	杨保山, 杨智, 陈性元, 韩冰, 杜学绘. Android应用敏感行为与隐私政策一致性分析[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 788-796.
[10]	董永峰, 白佳明, 王利琴, 王旭. 融合先验知识和字形特征的中文命名实体识别[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 702-708.
[11]	徐大鹏, 侯新民. 基于网络结构设计的图神经网络特征选择方法[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 663-670.
[12]	唐瑶瑶, 朱叶晨, 刘仰川, 高欣. CT图像环形伪影去除方法研究现状及展望[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 890-900.
[13]	蔡美玉, 朱润哲, 吴飞, 张开昱, 李家乐. 基于注意力机制和多粒度特征融合的跨视角匹配模型[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 901-908.
[14]	党伟超, 张磊, 高改梅, 刘春霞. 融合片段对比学习的弱监督动作定位方法[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 548-555.
[15]	黄子麒, 胡建鹏. 实体类别增强的汽车领域嵌套命名实体识别[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 377-384.

基于注意力特征自适应校正的图像描述模型

Image caption model based on attention feature adaptive recalibration

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics