基于生成式对抗网络的图像自动标注

doi:10.11772/j.issn.1001-9081.2018112400

摘要/Abstract

摘要：

针对基于深度学习的图像标注模型输出层神经元数目与标注词汇量成正比，导致模型结构因词汇量的变化而改变的问题，提出了结合生成式对抗网络（GAN）和Word2vec的新标注模型。首先，通过Word2vec将标注词汇映射为固定的多维词向量；其次，利用GAN构建神经网络模型——GAN-W模型，使输出层神经元数目与多维词向量维数相等，与词汇量不再相关；最后，通过对模型多次输出结果的排序来确定最终标注。GAN-W模型分别在Corel 5K和IAPRTC-12图像标注数据集上进行实验，在Corel 5K数据集上，GAN-W模型准确率、召回率和F1值比卷积神经网络回归（CNN-R）方法分别提高5、14和9个百分点；在IAPRTC-12数据集上，GAN-W模型准确率、召回率和F1值比两场K最邻近（2PKNN）模型分别提高2、6和3个百分点。实验结果表明，GAN-W模型可以解决输出神经元数目随词汇量改变的问题，同时每幅图像标注的标签数目自适应，使得该模型标注结果更加符合实际标注情形。

关键词: 图像自动标注, 深度学习, 生成式对抗网络, 标注向量化, 迁移学习

Abstract:

In order to solve the problem that the number of output neurons in deep learning-based image annotation model is directly proportionate to the labeled vocabulary, which leads the change of model structure caused by the change of vocabulary, a new annotation model combining Generative Adversarial Network (GAN) and Word2vec was proposed. Firstly, the labeled vocabulary was mapped to the fixed multidimensional word vector through Word2vec. Secondly, a neural network model called GAN-W (GAN-Word2vec annotation) was established based on GAN, making the number of neurons in model output layer equal to the dimension of multidimensional word vector and no longer relevant to the vocabulary. Finally, the annotation result was determined by sorting the multiple outputs of model. Experiments were conducted on the image annotation datasets Corel 5K and IAPRTC-12. The experimental results show that on Corel 5K dataset, the accuracy, recall and F1 value of the proposed model are increased by 5,14 and 9 percentage points respectively compared with those of Convolutional Neural Network Regression (CNN-R); on IAPRTC-12 dataset, the accuracy, recall and F1 value of the proposed model are 2,6 and 3 percentage points higher than those of Two-Pass K-Nearest Neighbor (2PKNN). The experimental results show that GAN-W model can solve the problem of neuron number change in output layer with vocabulary. Meanwhile, the number of labels in each image is self-adaptive, making the annotation results of the proposed model more suitable for actual annotation situation.

Key words: automatic image annotation, deep learning, Generative Adversarial Network (GAN), label vectorization, transfer learning

中图分类号:

TP391.413

税留成, 刘卫忠, 冯卓明. 基于生成式对抗网络的图像自动标注[J]. 计算机应用, 2019, 39(7): 2129-2133.

SHUI Liucheng, LIU Weizhong, FENG Zhuoming. Automatic image annotation based on generative adversarial network[J]. Journal of Computer Applications, 2019, 39(7): 2129-2133.

参考文献

[1] FENG S L, MANMATHA R, LAVRENKO V. Multiple Bernoulli relevance models for image and video annotation[C]//Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2004:1002-1009.
[2] JEON J, LAVRENKO V, MANMATHA R. Automatic image annotation and retrieval using cross-media relevance models[C]//Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York:ACM, 2003:119-126.
[3] MORAN S, LAVRENKO V. A sparse kernel relevance model for automatic image annotation[J]. Journal of Multimedia Information Retrieval, 2014, 3(4):209-229.
[4] MAKADIA A, PAVLOVIC V, KUMAR S. Baselines for image annotation[J]. International Journal of Computer Vision, 2010, 90(1):88-105.
[5] VERMA Y, JAWAHAR C V. Image annotation using metric learning in semantic neighborhoods[C]//Proceedings of the 12th European Conference on Computer Vision. Berlin:Springer, 2012:836-849.
[6] GUILLAUMIN M, MENSINK T, VERBEEK J, et al. TagProp:discriminative metric learning in nearest neighbor models for image auto-annotation[C]//Proceedings of the 12th IEEE International Conference on Computer Vision. Piscataway, NJ:IEEE, 2009:309-316.
[7] CHANG E, GOH K, SYCHAY G, et al. CBSA:content-based soft annotation for multimodal image retrieval using Bayes point machines[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2003, 13(1):26-38.
[8] GRANGIER D, BENGIO S. A discriminative kernel-based approach to rank images from text queries[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008, 30(8):1371-1384.
[9] YANG C, DONG M, HUA J. Region-based image annotation using asymmetrical support vector machine-based multiple-instance learning[C]//Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2006:2057-2063.
[10] 黎健成,袁春,宋友.基于卷积神经网络的多标签图像自动标注[J].计算机科学,2016,43(7):41-45.(LI J C, YUAN C, SONG Y. Multi-label image annotation based on convolutional neural network[J]. Computer Science, 2016, 43(7):41-45.)
[11] 高耀东,侯凌燕,杨大利.基于多标签学习的卷积神经网络的图像标注方法[J].计算机应用,2017,37(1):228-232.(GAO Y D, HOU L Y, YANG D L. Automatic image annotation method using multi-label learning convolutional neural network[J]. Journal of Computer Applications, 2017, 37(1):228-232.)
[12] 汪鹏,张奥帆,王利琴,等.基于迁移学习与多标签平滑策略的图像自动标注[J].计算机应用,2018,38(11):3199-3203.(WANG P, ZHANG A F, WANG L Q, et al. Image automatic annotation based on transfer learning and multi-label smoothing strategy[J]. Journal of Computer Applications, 2018, 38(11):3199-3203.)
[13] 李志欣,郑永哲,张灿龙,等.结合深度特征与多标记分类的图像语义标注[J].计算机辅助设计与图形学学报,2018,30(2):318-326.(LI Z X, ZHENG Y Z, ZHANG C L, et al. Combining deep feature and multi-label classification for semantic image annotation[J]. Journal of Computer-Aided Design and Computer Graphics, 2018, 30(2):318-326.)
[14] GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[C]//Proceedings of the 2014 Conference on Advances in Neural Information Processing Systems 27. Montreal:Curran Associates, 2014:2672-2680.
[15] 王坤峰,苟超,段艳杰,等.生成式对抗网络GAN的研究进展与展望[J].自动化学报,2017,43(3):321-332.(WANG K F, GOU C, DUAN Y J, et al. Generative adversarial networks:the state of the art and beyond[J]. Acta Automatica Sinica, 2017, 43(3):321-332.)
[16] MIRZA M, OSINDERO S. Conditional generative adversarial nets[J]. ArXiv Preprint, 2014, 2014:1411.1784.
[17] ARJOVSKY M, CHINTALA S, BOTTOU L. Wasserstein GAN[J]. ArXiv Preprint, 2017, 2017:1701.07875.
[18] GULRAJANI I, AHMED F, ARJOVSKY M, et al. Improved training of Wasserstein GANs[C]//Proceedings of the 30th Advances in Neural Information Processing Systems. Long Beach, CA:NIPS, 2017:5769-5779.
[19] SZEGEDY C, IOFFE S, VANHOUCKE V, et al. Inception-v4, inception-ResNet and the impact of residual connections on learning[C]//Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence. Menlo Park, CA:AAAI Press, 2017:4278-4284.
[20] FU H, ZHANG Q, QIU G. Random forest for image annotation[C]//Proceedings of the 12th European Conference on Computer Vision. Berlin:Springer, 2012:86-99.
[21] VERMA Y, JAWAHAR C. Exploring SVM for image annotation in presence of confusing labels[C]//Proceedings of the 24th British Machine Vision Conference. Durham:BMVA Press, 2013:1-11.
[22] KASHANI M M, AMIRI S H. Leveraging deep learning representation for search-based image annotation[C]//Proceedings of 2017 Artificial Intelligence and Signal Processing Conference. Piscataway, NJ:IEEE, 2017:156-161.
[23] MURTHY V N, MAJI S, MANMATHA R. Automatic image annotation using deep learning representations[C]//Proceedings of the 5th ACM on International Conference on Multimedia Retrieval. New York:ACM, 2015:603-606.
[24] 周铭柯,柯逍,杜明智.基于数据均衡的增进式深度自动图像标注[J].软件学报,2017,28(7):1862-1880.(ZHOU M K, KE X, DU M Z. Enhanced deep automatic image annotation based on data equalization[J]. Journal of Software, 2017, 28(7):1862-1880.)
[25] 柯逍,周铭柯,牛玉贞.融合深度特征和语义邻域的自动图像标注[J].模式识别与人工智能,2017,30(3):193-203.(KE X, ZHOU M K, NIU Y Z. Automatic image annotation combining semantic neighbors and deep features[J]. Pattern Recognition and Artificial Intelligence, 2017, 30(3):193-203.)