Journal of Computer Applications ›› 2020, Vol. 40 ›› Issue (12): 3612-3617.DOI: 10.11772/j.issn.1001-9081.2020040575

• Virtual reality and multimedia computing • Previous Articles     Next Articles

Text-to-image synthesis method based on multi-level progressive resolution generative adversarial networks

XU Yining, HE Xiaohai, ZHANG Jin, QING Linbo   

  1. College of Electronics and Information Engineering, Sichuan University, Chengdu Sichuan 610065, China
  • Received:2020-05-05 Revised:2020-07-09 Online:2020-12-10 Published:2020-07-24
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61871278), the Sichuan Science and Technology Program (2018HH0143), the Sichuan Education Department Program (18ZB0355), the Industrial Cluster Collaborative Innovation Project of Chengdu (2016-XT00-00015-GX).

基于多层次分辨率递进生成对抗网络的文本生成图像方法

许一宁, 何小海, 张津, 卿粼波   

  1. 四川大学 电子信息学院, 成都 610065
  • 通讯作者: 何小海(1964-),男,四川绵阳人,教授,博士,主要研究方向:图像处理、模式识别、图像通信。hxh@scu.edu.cn
  • 作者简介:许一宁(1996-),女,福建泉州人,硕士研究生,主要研究方向:计算机视觉、深度学习、图像生成;张津(1995-),女,山西大同人,博士研究生,主要研究方向:计算机视觉、深度学习、图像处理;卿粼波(1982-),男,四川资阳人,副教授,博士,主要研究方向:图像处理、模式识别、视频通信
  • 基金资助:
    国家自然科学基金资助项目(61871278);四川省科技计划项目(2018HH0143);四川省教育厅项目(18ZB0355);成都市产业集群协同创新项目(2016-XT00-00015-GX)。

Abstract: To address the problem that the results of text-to-image synthesis tasks have wrong target structures and unclear image textures, a Multi-level Progressive Resolution Generative Adversarial Network (MPRGAN) model was proposed based on Attentional Generative Adversarial Network (AttnGAN). Firstly, a semantic separation-fusion generation module was used in low-resolution layer, and the text feature was separated into three feature vectors by the guidance of self-attention mechanism and the feature vectors were used to generate feature maps respectively. Then, the feature maps were fused into low-resolution map, and the mask images were used as semantic constraints to improve the stability of the low-resolution generator. Finally, the progressive resolution residual structure was adopted in high-resolution layers. At the same time, the word attention mechanism and pixel shuffle were combined to further improve the quality of the generated images. Experimental results showed that, the Inception Score (IS) of the proposed model reaches 4.70 and 3.53 respectively on datasets of Caltech-UCSD Birds-200-2011 (CUB-200-2011) and 102 category flower dataset (Oxford-102), which are 7.80% and 3.82% higher than those of AttnGAN, respectively. The MPRGAN model can solve the instability problem of structure generation to a certain extent, and the images generated by the proposed model is closer to the real images.

Key words: text-to-image synthesis, Generative Adversarial Network (GAN), self-attention mechanism, residual structure, pixel shuffle

摘要: 针对文本生成图像任务存在生成图像有目标结构不合理、图像纹理不清晰等问题,在注意力生成对抗网络(AttnGAN)的基础上提出了多层次分辨率递进生成对抗网络(MPRGAN)模型。首先,在低分辨率层采用语义分离-融合生成模块,将文本特征在自注意力机制引导下分离为3个特征向量,并用这些特征向量分别生成特征图谱;然后,将特征图谱融合为低分辨率图谱,并采用mask图像作为语义约束以提高低分辨率生成器的稳定性;最后,在高分辨率层采用分辨率递进残差结构,同时结合词注意力机制和像素混洗来进一步改善生成图像的质量。实验结果表明,在数据集CUB-200-2011和Oxford-102上,所提模型的IS分别达到了4.70和3.53,与AttnGAN相比分别提高了7.80%和3.82%。MPRGAN模型能够在一定程度上解决结构生成不稳定的问题,同时其生成的图像也更接近真实图像。

关键词: 文本生成图像, 生成对抗网络, 自注意力机制, 残差结构, 像素混洗

CLC Number: