计算机应用 ›› 2019, Vol. 39 ›› Issue (7): 2129-2133.DOI: 10.11772/j.issn.1001-9081.2018112400

• 虚拟现实与多媒体计算 • 上一篇    下一篇

基于生成式对抗网络的图像自动标注

税留成, 刘卫忠, 冯卓明   

  1. 华中科技大学 光学与电子信息学院, 武汉 430074
  • 收稿日期:2018-12-05 修回日期:2019-01-21 出版日期:2019-07-10 发布日期:2019-07-15
  • 通讯作者: 税留成
  • 作者简介:税留成(1992-),男,四川成都人,硕士研究生,主要研究方向:计算机视觉、图像标注;刘卫忠(1972-),男,湖北荆州人,副教授,博士,主要研究方向:多媒体信源编码、机器学习;冯卓明(1970-),男,湖北荆州人,讲师,博士,主要研究方向:无线通信。

Automatic image annotation based on generative adversarial network

SHUI Liucheng, LIU Weizhong, FENG Zhuoming   

  1. School of Optical and Electronic Information, Huazhong University of Science and Technology, Wuhan Hubei 430074, China
  • Received:2018-12-05 Revised:2019-01-21 Online:2019-07-10 Published:2019-07-15

摘要:

针对基于深度学习的图像标注模型输出层神经元数目与标注词汇量成正比,导致模型结构因词汇量的变化而改变的问题,提出了结合生成式对抗网络(GAN)和Word2vec的新标注模型。首先,通过Word2vec将标注词汇映射为固定的多维词向量;其次,利用GAN构建神经网络模型——GAN-W模型,使输出层神经元数目与多维词向量维数相等,与词汇量不再相关;最后,通过对模型多次输出结果的排序来确定最终标注。GAN-W模型分别在Corel 5K和IAPRTC-12图像标注数据集上进行实验,在Corel 5K数据集上,GAN-W模型准确率、召回率和F1值比卷积神经网络回归(CNN-R)方法分别提高5、14和9个百分点;在IAPRTC-12数据集上,GAN-W模型准确率、召回率和F1值比两场K最邻近(2PKNN)模型分别提高2、6和3个百分点。实验结果表明,GAN-W模型可以解决输出神经元数目随词汇量改变的问题,同时每幅图像标注的标签数目自适应,使得该模型标注结果更加符合实际标注情形。

关键词: 图像自动标注, 深度学习, 生成式对抗网络, 标注向量化, 迁移学习

Abstract:

In order to solve the problem that the number of output neurons in deep learning-based image annotation model is directly proportionate to the labeled vocabulary, which leads the change of model structure caused by the change of vocabulary, a new annotation model combining Generative Adversarial Network (GAN) and Word2vec was proposed. Firstly, the labeled vocabulary was mapped to the fixed multidimensional word vector through Word2vec. Secondly, a neural network model called GAN-W (GAN-Word2vec annotation) was established based on GAN, making the number of neurons in model output layer equal to the dimension of multidimensional word vector and no longer relevant to the vocabulary. Finally, the annotation result was determined by sorting the multiple outputs of model. Experiments were conducted on the image annotation datasets Corel 5K and IAPRTC-12. The experimental results show that on Corel 5K dataset, the accuracy, recall and F1 value of the proposed model are increased by 5,14 and 9 percentage points respectively compared with those of Convolutional Neural Network Regression (CNN-R); on IAPRTC-12 dataset, the accuracy, recall and F1 value of the proposed model are 2,6 and 3 percentage points higher than those of Two-Pass K-Nearest Neighbor (2PKNN). The experimental results show that GAN-W model can solve the problem of neuron number change in output layer with vocabulary. Meanwhile, the number of labels in each image is self-adaptive, making the annotation results of the proposed model more suitable for actual annotation situation.

Key words: automatic image annotation, deep learning, Generative Adversarial Network (GAN), label vectorization, transfer learning

中图分类号: