计算机应用 ›› 2019, Vol. 39 ›› Issue (11): 3204-3209.DOI: 10.11772/j.issn.1001-9081.2019051077

• 2019年中国计算机学会人工智能会议(CCFAI2019)论文 • 上一篇    下一篇

多层次结构生成对抗网络的文本生成图像方法

孙钰1,2, 李林燕3, 叶子寒1,4, 胡伏原1, 奚雪峰1,5   

  1. 1. 苏州科技大学 电子与信息工程学院, 江苏 苏州 215009;
    2. 苏州市大数据与信息服务重点实验室, 江苏 苏州 215009;
    3. 苏州经贸职业技术学院, 江苏 苏州 215009;
    4. 江苏省建筑智慧节能重点实验室, 江苏 苏州 215009;
    5. 苏州市虚拟现实智能交互及应用技术重点实验室, 江苏 苏州 215009
  • 收稿日期:2019-05-24 修回日期:2019-06-28 出版日期:2019-11-10 发布日期:2019-09-11
  • 通讯作者: 胡伏原
  • 作者简介:孙钰(1995-),男,江苏靖江人,硕士研究生,CCF会员,主要研究方向:图像处理、深度学习、生成对抗网络;李林燕(1983-),女,湖南岳阳人,高级工程师,硕士,主要研究方向:地理信息处理;叶子寒(1996-),男,江西上饶人,CCF会员,主要研究方向:图像处理、深度学习、生成对抗网络;胡伏原(1978-),男,湖南岳阳人,教授,博士,CCF会员,主要研究方向:图像处理、模式识别、信息安全;奚雪峰(1978-),男,江苏苏州人,副教授,博士,CCF会员,主要研究方向:自然语言处理、机器学习、大数据处理。
  • 基金资助:
    国家自然科学基金资助项目(61876121,61472267);江苏省重点研发计划项目(BE2017663);苏州市科技发展计划项目(SZS201609);江苏省研究生科研创新项目(KYCX18_2549)。

Text-to-image synthesis method based on multi-level structure generative adversarial networks

SUN Yu1,2, LI Linyan3, YE Zihan1,4, HU Fuyuan1, XI Xuefeng1,5   

  1. 1. College of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou Jiangsu 215009, China;
    2. Suzhou Key Laboratory for Big Data and Information Service, Suzhou Jiangsu 215009, China;
    3. Suzhou Institute of Trade and Commerce, Suzhou Jiangsu 215009, China;
    4. Jiangsu Key Laboratory of Intelligent Building Energy Efficiency, Suzhou Jiangsu 215009, China;
    5. Virtual Reality Key Laboratory of Intelligent Interaction and Application Technology of Suzhou, Suzhou Jiangsu 215009, China
  • Received:2019-05-24 Revised:2019-06-28 Online:2019-11-10 Published:2019-09-11
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61876121,61472267), the Primary Research & Development Plan of Jiangsu Province (BE2017663), the Foundation of Key Laboratory in Science and Technology Development Project of Suzhou (SZS201609), the Graduate Research and Innovation Plan of Jiangsu Province (KYCX18_2549).

摘要: 近年来,生成对抗网络(GAN)在从文本描述到图像的生成中已经取得了显著成功,但仍然存在图像边缘模糊、局部纹理不清晰以及生成样本方差小等问题。针对上述不足,在叠加生成对抗网络模型(StackGAN++)基础上,提出了一种多层次结构生成对抗网络(MLGAN)模型,该网络模型由多个生成器和判别器以层次结构并列组成。首先,引入层次结构编码方法和词向量约束来改变网络中各层次生成器的条件向量,使图像的边缘细节和局部纹理更加清晰生动;然后,联合训练生成器和判别器,借助多个层次的生成图像分布共同逼近真实图像分布,使生成样本方差变大,增加生成样本的多样性;最后,从不同层次的生成器生成对应文本的不同尺度图像。实验结果表明,在CUB和Oxford-102数据集上MLGAN模型的Inception score分别达到了4.22和3.88,与StackGAN++相比,分别提高了4.45%和3.74%。MLGAN模型在解决生成图像的边缘模糊和局部纹理不清晰方面有了一定提升,其生成的图像更接近真实图像。

关键词: 生成对抗网络, 文本生成图像, 多层次结构生成对抗网络, 多层次图像分布, 层次结构编码

Abstract: In recent years, the Generative Adversarial Network (GAN) has achieved remarkable success in text-to-image synthesis, but there are still problems such as edge blurring of images, unclear local textures, small sample variance. In view of the above shortcomings, based on Stack Generative Adversarial Network model (StackGAN++), a Multi-Level structure Generative Adversarial Networks (MLGAN) model was proposed, which is composed of multiple generators and discriminators in a hierarchical structure. Firstly, hierarchical structure coding method and word vector constraint were introduced to change the condition vector of generator of each level in the network, so that the edge details and local textures of the image were clearer and more vivid. Then, the generator and the discriminator were jointed by trained to approximate the real image distribution by using the generated image distribution of multiple levels, so that the variance of the generated sample became larger, and the diversity of the generated sample was increased. Finally, different scale images of the corresponding text were generated by generators of different levels. The experimental results show that the Inception scores of the MLGAN model reached 4.22 and 3.88 respectively on CUB and Oxford-102 datasets, which were respectively 4.45% and 3.74% higher than that of StackGAN++. The MLGAN model has improvement in solving edge blurring and unclear local textures of the generated image, and the image generated by the model is closer to the real image.

Key words: Generative Adversarial Network (GAN), text-to-image synthesis, Multi-Level structure Generative Adversarial Networks (MLGAN), multi-level image distribution, hierarchical coding

中图分类号: