Text-to-image synthesis method based on multi-level progressive resolution generative adversarial networks

doi:10.11772/j.issn.1001-9081.2020040575

Journal of Computer Applications ›› 2020, Vol. 40 ›› Issue (12): 3612-3617.DOI: 10.11772/j.issn.1001-9081.2020040575

• Virtual reality and multimedia computing • Previous Articles Next Articles

Text-to-image synthesis method based on multi-level progressive resolution generative adversarial networks

XU Yining, HE Xiaohai, ZHANG Jin, QING Linbo

College of Electronics and Information Engineering, Sichuan University, Chengdu Sichuan 610065, China

Received:2020-05-05 Revised:2020-07-09 Online:2020-12-10 Published:2020-07-24
Supported by:
This work is partially supported by the National Natural Science Foundation of China （61871278）， the Sichuan Science and Technology Program （2018HH0143）， the Sichuan Education Department Program （18ZB0355）， the Industrial Cluster Collaborative Innovation Project of Chengdu （2016-XT00-00015-GX）.

基于多层次分辨率递进生成对抗网络的文本生成图像方法

许一宁, 何小海, 张津, 卿粼波

四川大学电子信息学院, 成都 610065

通讯作者: 何小海(1964-),男,四川绵阳人,教授,博士,主要研究方向:图像处理、模式识别、图像通信。hxh@scu.edu.cn
作者简介:许一宁(1996-),女,福建泉州人,硕士研究生,主要研究方向:计算机视觉、深度学习、图像生成;张津(1995-),女,山西大同人,博士研究生,主要研究方向:计算机视觉、深度学习、图像处理;卿粼波(1982-),男,四川资阳人,副教授,博士,主要研究方向:图像处理、模式识别、视频通信
基金资助:
国家自然科学基金资助项目（61871278）；四川省科技计划项目（2018HH0143）；四川省教育厅项目（18ZB0355）；成都市产业集群协同创新项目（2016-XT00-00015-GX）。

Abstract

Abstract: To address the problem that the results of text-to-image synthesis tasks have wrong target structures and unclear image textures, a Multi-level Progressive Resolution Generative Adversarial Network (MPRGAN) model was proposed based on Attentional Generative Adversarial Network (AttnGAN). Firstly, a semantic separation-fusion generation module was used in low-resolution layer, and the text feature was separated into three feature vectors by the guidance of self-attention mechanism and the feature vectors were used to generate feature maps respectively. Then, the feature maps were fused into low-resolution map, and the mask images were used as semantic constraints to improve the stability of the low-resolution generator. Finally, the progressive resolution residual structure was adopted in high-resolution layers. At the same time, the word attention mechanism and pixel shuffle were combined to further improve the quality of the generated images. Experimental results showed that, the Inception Score (IS) of the proposed model reaches 4.70 and 3.53 respectively on datasets of Caltech-UCSD Birds-200-2011 (CUB-200-2011) and 102 category flower dataset (Oxford-102), which are 7.80% and 3.82% higher than those of AttnGAN, respectively. The MPRGAN model can solve the instability problem of structure generation to a certain extent, and the images generated by the proposed model is closer to the real images.

Key words: text-to-image synthesis, Generative Adversarial Network (GAN), self-attention mechanism, residual structure, pixel shuffle

摘要： 针对文本生成图像任务存在生成图像有目标结构不合理、图像纹理不清晰等问题，在注意力生成对抗网络（AttnGAN）的基础上提出了多层次分辨率递进生成对抗网络（MPRGAN）模型。首先，在低分辨率层采用语义分离-融合生成模块，将文本特征在自注意力机制引导下分离为3个特征向量，并用这些特征向量分别生成特征图谱；然后，将特征图谱融合为低分辨率图谱，并采用mask图像作为语义约束以提高低分辨率生成器的稳定性；最后，在高分辨率层采用分辨率递进残差结构，同时结合词注意力机制和像素混洗来进一步改善生成图像的质量。实验结果表明，在数据集CUB-200-2011和Oxford-102上，所提模型的IS分别达到了4.70和3.53，与AttnGAN相比分别提高了7.80%和3.82%。MPRGAN模型能够在一定程度上解决结构生成不稳定的问题，同时其生成的图像也更接近真实图像。

关键词: 文本生成图像, 生成对抗网络, 自注意力机制, 残差结构, 像素混洗

CLC Number:

TP391.41

XU Yining, HE Xiaohai, ZHANG Jin, QING Linbo. Text-to-image synthesis method based on multi-level progressive resolution generative adversarial networks[J]. Journal of Computer Applications, 2020, 40(12): 3612-3617.

许一宁, 何小海, 张津, 卿粼波. 基于多层次分辨率递进生成对抗网络的文本生成图像方法[J]. 计算机应用, 2020, 40(12): 3612-3617.

References

[1] GOODFELLOW I J, POUGET-ABADIE J, MIRZA M, et al. Generative adversarialnets[C]//Proceedings of the 201427th International Conference on Neural Information Processing Systems. Cambridge:MIT Press,2014:2672-2680.
[2] REED S E,AKATA Z,YAN X,et al. Generative adversarial text to image synthesis[C]//Proceedings of the 201633rd International Conference on Machine Learning. New York:ACM,2016:1060-1069.
[3] HUANG H,YU P S,WANG C. An introduction to image synthesis with generative adversarialnets[EB/OL].[2020-04-22]. https://arxiv.org/pdf/1803.04469.pdf.
[4] MIRZA M,OSINDERO S. Conditional generative adversarialnets[EB/OL].[2020-04-22]. https://arxiv.org/pdf/1411.1784.pdf.
[5] ZHANG H,XU T,LI H,et al. StackGAN:text to photo-realistic image synthesis with stacked generative adversarialnetworks[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway:IEEE,2017:5908-5916.
[6] ZHANG Z,XIE Y,YANG L. Photographic text-to-image synthesis with a hierarchically-nested adversarialnetwork[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2018:6199-6208.
[7] 孙钰, 李林燕, 叶子寒, 等. 多层次结构生成对抗网络的文本生成图像方法[J]. 计算机应用, 2019, 39(11):3204-3209.(SUN Y, LI L Y,YE Z H,et al. Text-to-image synthesis method based on multi-level structure generative adversarialnetworks[J]. Journal of Computer Applications,2019,39(11):3204-3209.)
[8] 黄宏宇, 谷子丰. 一种基于自注意力机制的文本图像生成对抗网络[J]. 重庆大学学报, 2020, 43(3):55-61.(HUANG H Y,GU Z F. A generative adversarialnetwork base on self-attention mechanism for text-to-image generation[J]. Journal of Chongqing University,2020,43(3):55-61.)
[9] XU T,ZHANG P,HUANG Q,et al. AttnGAN:fine-grained text to image generation with attentional generative adversarialnetworks[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2018:1316-1324.
[10] YANG Z,YUAN Y,WU Y,et al. Reviewnetworks for caption generation[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems. Red Hook:Curran Associates Inc.,2016:2361-2369.
[11] QIAO T,ZHANG J,XU D,et al. MirrorGAN:learning text-toimage generation by redescription[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2019:1505-1514.
[12] 莫建文, 徐凯亮, 林乐平, 等. 结合互信息最大化的文本到图像生成方法[J]. 西安电子科技大学学报, 2019, 46(5):180-188. (MO J W,XU K L,LIN L P,et al. Text-to-image generationcombined with mutual information maximization[J]. Journal of Xidian University,2019,46(5):180-188.)
[13] SZEGEDY C,VANHOUCKE V,IOFFE S,et al. Rethinking the inception architecture forcomputer vision[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2016:2818-2826.
[14] KARRAS T, LAINE S, AILA T. A style-based generator architecture for generative adversarialnetworks[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2019:4396-4405.
[15] VASWANI A,SHAZEER N,PARMAR N,et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook:Curran Associates Inc.,2017:6000-6010.
[16] LI Z,YANG J,LIU Z,et al. Feedbacknetwork for image superresolution[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2019:3862-3871.
[17] SHI W,CABALLERO J,HUSZÁR F,et al. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neuralnetwork[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2016:1874-1883.
[18] WAH C,BRANSON S,WELINDER P,et al. The Caltech-UCSD Birds-200-2011 dataset:CNS-TR-2011-001[R]. Pasadena:California Institute of Technology,2011.
[19] NILSBACK M E,ZISSERMAN A. Automated flower classification over a large number of classes[C]//Proceedings of the 20086th Indian Conference on Computer Vision, Graphics and Image Processing. Piscataway:IEEE,2008:722-729.
[20] SALIMANS T, GOODFELLOW I, ZAREMBA W, et al. Improved techniques for training GANs[C]//Proceedings of the 201630th International Conference on Neural Information Processing Systems. Red Hook:Curran Associates Inc.,2016:2234-2242
[21] HEUSEL M,RAMSAUER H,UNTERTHINER T,et al. GANs trained by a two time-scale update rule converge to a local Nash equilibrium[C]//Proceedings of the 201731st International Conference on Neural Information Processing Systems. Red Hook:Curran Associates Inc.,2017:6626-6637.

Text-to-image synthesis method based on multi-level progressive resolution generative adversarial networks

基于多层次分辨率递进生成对抗网络的文本生成图像方法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

[1]	GUAN Qijie, ZHANG Ting, LI Deya, ZHOU Shaojing, DU Yi. Indefinite reconstruction method of spatial data based on multi-resolution generative adversarial network [J]. Journal of Computer Applications, 2021, 41(8): 2306-2311.
[2]	SUN Xiao, XU Jindong. Remote sensing image dehazing method based on cascaded generative adversarial network [J]. Journal of Computer Applications, 2021, 41(8): 2440-2444.
[3]	DANG Weichao, LI Tao, BAI Shangwang, GAO Gaimei, LIU Chunxia. Real-time remaining life prediction method of Web software system based on self-attention-long short-term memory network [J]. Journal of Computer Applications, 2021, 41(8): 2346-2351.
[4]	TANG Guihua, SUN Lei, MAO Xiuqing, DAI Leyu, HU Yongjin. Generative adversarial network synthesized face detection based on deep alignment network [J]. Journal of Computer Applications, 2021, 41(7): 1922-1927.
[5]	WANG Xianwu, ZHANG Ting, JI Xin, DU Yi. 3D shale digital core reconstruction method based on deep convolutional generative adversarial network with gradient penalty [J]. Journal of Computer Applications, 2021, 41(6): 1805-1811.
[6]	LI Yanzhi, FAN Yong, GAO Lin. Anomaly detection of oil drilling water flow based on shape flow [J]. Journal of Computer Applications, 2021, 41(6): 1842-1848.
[7]	GUO Maozu, YANG Qiannan, ZHAO Lingling. Image generation based on conditional-Wassertein generative adversarial network [J]. Journal of Computer Applications, 2021, 41(5): 1432-1437.
[8]	SUN Heli, SUN Yuzhu, ZHANG Xiaoyun. Event description generation based on generative adversarial network [J]. Journal of Computer Applications, 2021, 41(5): 1256-1261.
[9]	OU Lili, SHAO Fengjing, SUN Rencheng, SUI Yi. Cerebral infarction image recognition based on semi-supervised method [J]. Journal of Computer Applications, 2021, 41(4): 1221-1226.
[10]	DUAN Youxiang, ZHANG Hanxiao, SUN Qifeng, SUN Youkai. Image super-resolution reconstruction algorithm based on Laplacian pyramid generative adversarial network [J]. Journal of Computer Applications, 2021, 41(4): 1020-1026.
[11]	LI Hongxia, QIN Pinle, YAN Hanmei, ZENG Jianchao, BAO Qianyue, CHAI Rui. Face frontalization generative adversarial network algorithm based on face feature map symmetry [J]. Journal of Computer Applications, 2021, 41(3): 714-720.
[12]	ZHANG Ya, JIN Xin, JIANG Qian, LEE Shin-jye, DONG Yunyun, YAO Shaowen. Deepfake image detection method based on autoencoder [J]. Journal of Computer Applications, 2021, 41(10): 2985-2990.
[13]	QIU Yaoru, SUN Weijun, HUANG Yonghui, TANG Yuqi, ZHANG Haochuan, WU Junpeng. Person re-identification method based on GAN uniting with spatial-temporal pattern [J]. Journal of Computer Applications, 2020, 40(9): 2493-2498.
[14]	CHEN Jiawei, HAN Fang, WANG Zhijie. Aspect-based sentiment analysis with self-attention gated graph convolutional network [J]. Journal of Computer Applications, 2020, 40(8): 2202-2206.
[15]	LI Shengwu, ZHANG Xuande. Multi-domain convolutional neural network based on self-attention mechanism for visual tracking [J]. Journal of Computer Applications, 2020, 40(8): 2219-2224.