Existing generation models have difficulty in directly generating high-resolution images from complex semantic labels. Thus, a Generative Adversarial Network based on Semantic Labels and Noise Prior (SLNP-GAN) was proposed. Firstly, the semantic labels (including information of shape, position and category) were directly used as input, the global generator was used to encode them, the coarse-grained global attributes were learned by combining the noise prior, and the low-resolution images were generated. Then, with the attention mechanism, the local refined generator was used to query the high-resolution sub-labels corresponding to the sub-regions of the low-resolution images, and the fine-grained information was obtained, the complex images with clear textures were thus generated. Finally, the improved Adam with Momentum (AMM) algorithm was introduced to optimize the adversarial training. The experimental results show that, compared with the existing method text2img, the proposed method has the Pixel Accuracy (PA) increased by 23.73% and 11.09% respectively on COCO_Stuff and the ADE20K datasets; in comparison with the Adam algorithm, the AMM algorithm doubles the convergence speed with much smaller loss amplitude. It proves that SLNP-GAN can efficiently obtain global features as well as local textures and generate fine-grained high-quality images.
1 ZHANG H , XU T , LI H . StackGAN: text to photo-realistic image synthesis with stacked generative adversarial networks[C]// Proceedings of 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 5908-5916.
2 REED S , OORD A VAN DEN , KALCHBRENNER K , et al . Parallel multiscale autoregressive density estimation[C]// Proceedings of the 34th International Conference on Machine Learning. New York: JMLR.org, 2017: 2912-2921.
3 MANSIMOV E , PARISOTTO E , BA J L . Generating images from captions with attention[EB/OL]. [2019-07-28].https://arxiv.org/pdf/1511.02793.
4 CHEN Q , KOLTUN V . Photographic image synthesis with cascaded refinement networks[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 1520-1529.
5 ISOLA P , ZHU J , ZHOU T , et al . Image-to-image translation with conditional adversarial networks[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision and Pattern Recognition, 2017: 5967-5976.
6 QIAO T , ZHANG J , XU D , et al . MirrorGAN: learning text-to-image generation by redescription[C]// Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019:1505-1514.
7 HONG S , YANG D , CHOI J , et al . Inferring semantic layout for hierarchical text-to-image synthesis[C]// Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018:7986-7994.
8 JOHNSON J , GUPTA A , LI F . Image generation from scene graphs[C]// Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 1219-1228.
9 LI Y , OUYANG W , ZHOU B , et al. Scene graph generation from objects , phrases and region captions[C]// Proceedings of 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 1270-1279.
10 郭丽丽,丁世飞 . 深度学习研究进展[J]. 计算机科学, 2015,42(5): 28-33. (GUO L L, DING S F. Research progress on deep learning[J]. Computer Science, 2015, 42(5): 28-33.)
11 刘波宁,翟东海 . 基于双鉴别网络的生成对抗网络图像修复方法[J].计算机应用, 2018, 38(12): 3557-3562, 3595. (LIU B N, ZHAI D H. Image completion method of generative adversarial networks based on two discrimination networks[J]. Journal of Computer Applications, 2018, 38(12): 3557-3562, 3595.)
12 XU T , ZHANG P , HUANG Q , et al . AttnGAN: fine-grained text to image generation with attentional generative adversarial networks[C]// Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 1316-1324.
13 MNIH V , HEESS N , GRAVES A , et al . Recurrent models of visual attention[C]// Proceedings of the 27th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2014:2204-2212.
14 WANG T , LIU M , ZHU J , et al . High-resolution image synthesis and semantic manipulation with conditional GANs[C]// Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 8798-8807.
15 ANDERSON P , HE X , BUEHLER C , et al . Bottom-up and top-down attention for image captioning and visual question answering[C]// Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 6077-6086
16 MA S , FU J , CHEN C W , et al . DA-GAN: instance-level image translation by deep attention generative adversarial networks[C]// Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 5657-5666.
17 LU J , YANG J , BATRA D , et al . Hierarchical question-image co-attention for visual question answering[EB/OL]. [2019-06-23].https://arxiv.org/pdf/1606.00061.pdf.
18 JOHNSON J , ALAHI A , LI F . Perceptual losses for real-time style transfer and super-resolution[C]// Proceedings of 2016 European Conference on Computer Vision, LNCS 9906. Cham: Springer, 2016: 694-711.
19 GATYS L A , ECKER A S , BETHGE M . Image style transfer using convolutional neural networks[C]// Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 2414-2423.
20 修春波,马云菲,潘肖楠 . 基于距离融合的图像特征点匹配方法[J].计算机应用, 2019, 39(11): 3158-3162. (XIU C B, MA Y F, PAN X N. Image feature point matching method based on distance fusion[J]. Journal of Computer Applications, 2019, 39(11): 3158-3162.)
21 LI Y , SNAVELY N , HUTTENLOCHER D P . Location recognition using prioritized feature matching[C]// Proceedings of 2010 European Conference on Computer Vision, LNCS 6312. Berlin: Springer, 2010: 791-804.
22 CAESAR H , UIJLINGS J , FERRARI V . COCO-stuff: thing and stuff classes in context[C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 1209-1218.
23 ZHOU B , ZHAO H , PUIG X , et al . Scene parsing through Ade20K dataset[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 5122-5130.