Text-to-image synthesis method based on multi-level structure generative adversarial networks
SUN Yu1,2, LI Linyan3, YE Zihan1,4, HU Fuyuan1, XI Xuefeng1,5
1. College of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou Jiangsu 215009, China; 2. Suzhou Key Laboratory for Big Data and Information Service, Suzhou Jiangsu 215009, China; 3. Suzhou Institute of Trade and Commerce, Suzhou Jiangsu 215009, China; 4. Jiangsu Key Laboratory of Intelligent Building Energy Efficiency, Suzhou Jiangsu 215009, China; 5. Virtual Reality Key Laboratory of Intelligent Interaction and Application Technology of Suzhou, Suzhou Jiangsu 215009, China
Abstract:In recent years, the Generative Adversarial Network (GAN) has achieved remarkable success in text-to-image synthesis, but there are still problems such as edge blurring of images, unclear local textures, small sample variance. In view of the above shortcomings, based on Stack Generative Adversarial Network model (StackGAN++), a Multi-Level structure Generative Adversarial Networks (MLGAN) model was proposed, which is composed of multiple generators and discriminators in a hierarchical structure. Firstly, hierarchical structure coding method and word vector constraint were introduced to change the condition vector of generator of each level in the network, so that the edge details and local textures of the image were clearer and more vivid. Then, the generator and the discriminator were jointed by trained to approximate the real image distribution by using the generated image distribution of multiple levels, so that the variance of the generated sample became larger, and the diversity of the generated sample was increased. Finally, different scale images of the corresponding text were generated by generators of different levels. The experimental results show that the Inception scores of the MLGAN model reached 4.22 and 3.88 respectively on CUB and Oxford-102 datasets, which were respectively 4.45% and 3.74% higher than that of StackGAN++. The MLGAN model has improvement in solving edge blurring and unclear local textures of the generated image, and the image generated by the model is closer to the real image.
孙钰, 李林燕, 叶子寒, 胡伏原, 奚雪峰. 多层次结构生成对抗网络的文本生成图像方法[J]. 计算机应用, 2019, 39(11): 3204-3209.
SUN Yu, LI Linyan, YE Zihan, HU Fuyuan, XI Xuefeng. Text-to-image synthesis method based on multi-level structure generative adversarial networks. Journal of Computer Applications, 2019, 39(11): 3204-3209.
[1] DOSOVITSKIY A, SPRINGENBERG J T, BROX T. Learning to generate chairs with convolutional neural networks[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2015:1538-1546. [2] EHSANI K, BAGHERINEZHAD H, REDMON J, et al. Who let the dogs out? modeling dog behavior from visual data[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2018:4051-4060. [3] 郭雨潇, 陈雷霆, 董悦. 单帧图像下的环境光遮蔽估计[J]. 计算机研究与发展, 2019, 56(2):385-393. (GUO Y X, CHEN L T, DONG Y. Inferring ambient occlusion from a single image[J]. Journal of Computer Research and Development, 2019, 56(2):385-393.) [4] 赵树阳, 李建武.基于生成对抗网络的低秩图像生成方法[J]. 自动化学报, 2018, 44(5):829-839. (ZHAO S Y, LI J W. Generative adversarial network for generating low-rank images[J]. Acta Automatica Sinica, 2018, 44(5):829-839.) [5] 何新宇,张晓龙.基于深度神经网络的肺炎图像识别模型[J]. 计算机应用,2019,39(6):1680-1684. (HE X Y, ZHANG X L. Pneumonia image recognition model based on deep neural network[J]. Journal of Computer Applications, 2019, 39(6):1680-1684.) [6] REMATAS K, KEMELMACHER-SHLIZERMAN I, CURLESS B, et al. Soccer on your tabletop[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2018:4738-4747. [7] van DEN OORD A, KALCHBRENNER N, KAVUKCUOGLU K. Pixel recurrent neural networks[C]//Proceedings of the 33rd International Conference on Machine Learning. New York:ACM, 2016:1747-1756. [8] GOODFELLOW I J, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems. Cambridge, MA:MIT Press, 2014:2672-2680. [9] PAN Z, YU W, YI X, et al. Recent progress on Generative Adversarial Networks (GANs):a survey[J]. IEEE Access, 2019, 7:36322-36333. [10] CAO Y J, JIA L L, CHEN Y X, et al. Recent advances of generative adversarial networks in computer vision[J]. IEEE Access, 2019, 7:14985-15006. [11] WANG X, GUPTA A. Generative image modeling using style and structure adversarial networks[C]//Proceedings of the 2016 European Conference on Computer Vision. Cham:Springer, 2016:318-335. [12] DENTON E L, CHINTALA S, SZLAM A, et al. Deep generative image models using a Laplacian pyramid of adversarial networks[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems. New York:ACM, 2015:1486-1494. [13] DURUGKAR I, GEMP I, MAHADEVAN S. Generative multi-adversarial networks[EB/OL].[2018-06-20].https://www.taodocs.com/p-110588603.html. [14] REED S, AKATA Z, YAN X, et al. Generative adversarial text-to-image synthesis[C]//Proceedings of the 33rd International Conference on Machine Learning. New York:ACM, 2016:1060-1069. [15] REED S, AKATA Z, MOHAN S, et al. Learning what and where to draw[C]//Proceedings of International Conference on Neural Information Processing Systems. New York:ACM, 2016:217-225. [16] ZHANG Z, XIE Y, YANG L. Photographic text-to-image synthesis with a hierarchically-nested adversarial network[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2018:6199-6208. [17] ZHANG H, XU T, LI H, et al. StackGAN++:realistic image synthesis with stacked generative adversarial networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(8):1947-1962. [18] LU J, YANG J, BATRA D, et al. Hierarchical question-image co-attention for visual question answering[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems. New York:ACM, 2016:289-297. [19] HU B, LU Z, LI H, et al. Convolutional neural network architectures for matching natural language sentences[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems. New York:ACM, 2014:2042-2050. [20] ABADI M, BARHAM P, CHEN J, et al. TensorFlow:a system for large-scale machine learning[C]//Proceedings of the 2016 Conference on Operating Systems Design and Implementation. Piscataway:IEEE, 2016:265-283. [21] WAH C, BRANSON S, WELINDER P, et al. The Caltech-UCSD Birds-200-2011 dataset:computation & neural systems technical report[R]. Pasadena, CA, USA:California Institute of Technology, 2011. [22] NILSBACK M E, ZISSERMAN A. Automated flower classification over a large number of classes[C]//Proceedings of the 6th Indian Conference on Computer Vision, Graphics & Image Processing. Piscataway:IEEE, 2008:722-729. [23] SALIMANS T, GOODFELLOW I J, ZAREMBA W, et al. Improved techniques for training GANs[C]//Proceedings of International Conference on Neural Information Processing Systems. New York:ACM, 2016:2234-2242.