多层次结构生成对抗网络的文本生成图像方法

doi:10.11772/j.issn.1001-9081.2019051077

计算机应用 ›› 2019, Vol. 39 ›› Issue (11): 3204-3209.DOI: 10.11772/j.issn.1001-9081.2019051077

• 2019年中国计算机学会人工智能会议(CCFAI2019)论文 • 上一篇下一篇

多层次结构生成对抗网络的文本生成图像方法

孙钰^1,2, 李林燕³, 叶子寒^1,4, 胡伏原¹, 奚雪峰^1,5

1. 苏州科技大学电子与信息工程学院, 江苏苏州 215009;
2. 苏州市大数据与信息服务重点实验室, 江苏苏州 215009;
3. 苏州经贸职业技术学院, 江苏苏州 215009;
4. 江苏省建筑智慧节能重点实验室, 江苏苏州 215009;
5. 苏州市虚拟现实智能交互及应用技术重点实验室, 江苏苏州 215009

收稿日期:2019-05-24 修回日期:2019-06-28 出版日期:2019-11-10 发布日期:2019-09-11
通讯作者: 胡伏原
作者简介:孙钰(1995-),男,江苏靖江人,硕士研究生,CCF会员,主要研究方向:图像处理、深度学习、生成对抗网络;李林燕(1983-),女,湖南岳阳人,高级工程师,硕士,主要研究方向:地理信息处理;叶子寒(1996-),男,江西上饶人,CCF会员,主要研究方向:图像处理、深度学习、生成对抗网络;胡伏原(1978-),男,湖南岳阳人,教授,博士,CCF会员,主要研究方向:图像处理、模式识别、信息安全;奚雪峰(1978-),男,江苏苏州人,副教授,博士,CCF会员,主要研究方向:自然语言处理、机器学习、大数据处理。
基金资助:
国家自然科学基金资助项目（61876121，61472267）；江苏省重点研发计划项目（BE2017663）；苏州市科技发展计划项目（SZS201609）；江苏省研究生科研创新项目（KYCX18_2549）。

Text-to-image synthesis method based on multi-level structure generative adversarial networks

SUN Yu^1,2, LI Linyan³, YE Zihan^1,4, HU Fuyuan¹, XI Xuefeng^1,5

1. College of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou Jiangsu 215009, China;
2. Suzhou Key Laboratory for Big Data and Information Service, Suzhou Jiangsu 215009, China;
3. Suzhou Institute of Trade and Commerce, Suzhou Jiangsu 215009, China;
4. Jiangsu Key Laboratory of Intelligent Building Energy Efficiency, Suzhou Jiangsu 215009, China;
5. Virtual Reality Key Laboratory of Intelligent Interaction and Application Technology of Suzhou, Suzhou Jiangsu 215009, China

Received:2019-05-24 Revised:2019-06-28 Online:2019-11-10 Published:2019-09-11
Supported by:
This work is partially supported by the National Natural Science Foundation of China (61876121,61472267), the Primary Research & Development Plan of Jiangsu Province (BE2017663), the Foundation of Key Laboratory in Science and Technology Development Project of Suzhou (SZS201609), the Graduate Research and Innovation Plan of Jiangsu Province (KYCX18_2549).

摘要/Abstract

摘要： 近年来，生成对抗网络（GAN）在从文本描述到图像的生成中已经取得了显著成功，但仍然存在图像边缘模糊、局部纹理不清晰以及生成样本方差小等问题。针对上述不足，在叠加生成对抗网络模型（StackGAN++）基础上，提出了一种多层次结构生成对抗网络（MLGAN）模型，该网络模型由多个生成器和判别器以层次结构并列组成。首先，引入层次结构编码方法和词向量约束来改变网络中各层次生成器的条件向量，使图像的边缘细节和局部纹理更加清晰生动；然后，联合训练生成器和判别器，借助多个层次的生成图像分布共同逼近真实图像分布，使生成样本方差变大，增加生成样本的多样性；最后，从不同层次的生成器生成对应文本的不同尺度图像。实验结果表明，在CUB和Oxford-102数据集上MLGAN模型的Inception score分别达到了4.22和3.88，与StackGAN++相比，分别提高了4.45%和3.74%。MLGAN模型在解决生成图像的边缘模糊和局部纹理不清晰方面有了一定提升，其生成的图像更接近真实图像。

关键词: 生成对抗网络, 文本生成图像, 多层次结构生成对抗网络, 多层次图像分布, 层次结构编码

Abstract: In recent years, the Generative Adversarial Network (GAN) has achieved remarkable success in text-to-image synthesis, but there are still problems such as edge blurring of images, unclear local textures, small sample variance. In view of the above shortcomings, based on Stack Generative Adversarial Network model (StackGAN++), a Multi-Level structure Generative Adversarial Networks (MLGAN) model was proposed, which is composed of multiple generators and discriminators in a hierarchical structure. Firstly, hierarchical structure coding method and word vector constraint were introduced to change the condition vector of generator of each level in the network, so that the edge details and local textures of the image were clearer and more vivid. Then, the generator and the discriminator were jointed by trained to approximate the real image distribution by using the generated image distribution of multiple levels, so that the variance of the generated sample became larger, and the diversity of the generated sample was increased. Finally, different scale images of the corresponding text were generated by generators of different levels. The experimental results show that the Inception scores of the MLGAN model reached 4.22 and 3.88 respectively on CUB and Oxford-102 datasets, which were respectively 4.45% and 3.74% higher than that of StackGAN++. The MLGAN model has improvement in solving edge blurring and unclear local textures of the generated image, and the image generated by the model is closer to the real image.

Key words: Generative Adversarial Network (GAN), text-to-image synthesis, Multi-Level structure Generative Adversarial Networks (MLGAN), multi-level image distribution, hierarchical coding

中图分类号:

TP391

孙钰, 李林燕, 叶子寒, 胡伏原, 奚雪峰. 多层次结构生成对抗网络的文本生成图像方法[J]. 计算机应用, 2019, 39(11): 3204-3209.

SUN Yu, LI Linyan, YE Zihan, HU Fuyuan, XI Xuefeng. Text-to-image synthesis method based on multi-level structure generative adversarial networks[J]. Journal of Computer Applications, 2019, 39(11): 3204-3209.

参考文献

[1] DOSOVITSKIY A, SPRINGENBERG J T, BROX T. Learning to generate chairs with convolutional neural networks[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2015:1538-1546.
[2] EHSANI K, BAGHERINEZHAD H, REDMON J, et al. Who let the dogs out? modeling dog behavior from visual data[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2018:4051-4060.
[3] 郭雨潇, 陈雷霆, 董悦. 单帧图像下的环境光遮蔽估计[J]. 计算机研究与发展, 2019, 56(2):385-393. (GUO Y X, CHEN L T, DONG Y. Inferring ambient occlusion from a single image[J]. Journal of Computer Research and Development, 2019, 56(2):385-393.)
[4] 赵树阳, 李建武.基于生成对抗网络的低秩图像生成方法[J]. 自动化学报, 2018, 44(5):829-839. (ZHAO S Y, LI J W. Generative adversarial network for generating low-rank images[J]. Acta Automatica Sinica, 2018, 44(5):829-839.)
[5] 何新宇,张晓龙.基于深度神经网络的肺炎图像识别模型[J]. 计算机应用,2019,39(6):1680-1684. (HE X Y, ZHANG X L. Pneumonia image recognition model based on deep neural network[J]. Journal of Computer Applications, 2019, 39(6):1680-1684.)
[6] REMATAS K, KEMELMACHER-SHLIZERMAN I, CURLESS B, et al. Soccer on your tabletop[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2018:4738-4747.
[7] van DEN OORD A, KALCHBRENNER N, KAVUKCUOGLU K. Pixel recurrent neural networks[C]//Proceedings of the 33rd International Conference on Machine Learning. New York:ACM, 2016:1747-1756.
[8] GOODFELLOW I J, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems. Cambridge, MA:MIT Press, 2014:2672-2680.
[9] PAN Z, YU W, YI X, et al. Recent progress on Generative Adversarial Networks (GANs):a survey[J]. IEEE Access, 2019, 7:36322-36333.
[10] CAO Y J, JIA L L, CHEN Y X, et al. Recent advances of generative adversarial networks in computer vision[J]. IEEE Access, 2019, 7:14985-15006.
[11] WANG X, GUPTA A. Generative image modeling using style and structure adversarial networks[C]//Proceedings of the 2016 European Conference on Computer Vision. Cham:Springer, 2016:318-335.
[12] DENTON E L, CHINTALA S, SZLAM A, et al. Deep generative image models using a Laplacian pyramid of adversarial networks[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems. New York:ACM, 2015:1486-1494.
[13] DURUGKAR I, GEMP I, MAHADEVAN S. Generative multi-adversarial networks[EB/OL].[2018-06-20].https://www.taodocs.com/p-110588603.html.
[14] REED S, AKATA Z, YAN X, et al. Generative adversarial text-to-image synthesis[C]//Proceedings of the 33rd International Conference on Machine Learning. New York:ACM, 2016:1060-1069.
[15] REED S, AKATA Z, MOHAN S, et al. Learning what and where to draw[C]//Proceedings of International Conference on Neural Information Processing Systems. New York:ACM, 2016:217-225.
[16] ZHANG Z, XIE Y, YANG L. Photographic text-to-image synthesis with a hierarchically-nested adversarial network[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2018:6199-6208.
[17] ZHANG H, XU T, LI H, et al. StackGAN++:realistic image synthesis with stacked generative adversarial networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(8):1947-1962.
[18] LU J, YANG J, BATRA D, et al. Hierarchical question-image co-attention for visual question answering[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems. New York:ACM, 2016:289-297.
[19] HU B, LU Z, LI H, et al. Convolutional neural network architectures for matching natural language sentences[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems. New York:ACM, 2014:2042-2050.
[20] ABADI M, BARHAM P, CHEN J, et al. TensorFlow:a system for large-scale machine learning[C]//Proceedings of the 2016 Conference on Operating Systems Design and Implementation. Piscataway:IEEE, 2016:265-283.
[21] WAH C, BRANSON S, WELINDER P, et al. The Caltech-UCSD Birds-200-2011 dataset:computation & neural systems technical report[R]. Pasadena, CA, USA:California Institute of Technology, 2011.
[22] NILSBACK M E, ZISSERMAN A. Automated flower classification over a large number of classes[C]//Proceedings of the 6th Indian Conference on Computer Vision, Graphics & Image Processing. Piscataway:IEEE, 2008:722-729.
[23] SALIMANS T, GOODFELLOW I J, ZAREMBA W, et al. Improved techniques for training GANs[C]//Proceedings of International Conference on Neural Information Processing Systems. New York:ACM, 2016:2234-2242.

多层次结构生成对抗网络的文本生成图像方法

Text-to-image synthesis method based on multi-level structure generative adversarial networks

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	管其杰, 张挺, 李德亚, 周绍景, 杜奕. 基于多分辨率生成对抗网络的空间数据不确定性重建方法[J]. 计算机应用, 2021, 41(8): 2306-2311.
[2]	孙潇, 徐金东. 基于级联生成对抗网络的遥感图像去雾方法[J]. 计算机应用, 2021, 41(8): 2440-2444.
[3]	汤桂花, 孙磊, 毛秀青, 戴乐育, 胡永进. 基于深度对齐网络的生成对抗网络伪造人脸检测[J]. 计算机应用, 2021, 41(7): 1922-1927.
[4]	王先武, 张挺, 吉欣, 杜奕. 基于带梯度惩罚深度卷积生成对抗网络的页岩三维数字岩心重构方法[J]. 计算机应用, 2021, 41(6): 1805-1811.
[5]	李衍志, 范勇, 高琳. 基于形态流的石油钻井水流异常检测[J]. 计算机应用, 2021, 41(6): 1842-1848.
[6]	井贝贝, 郭嘉, 王丽清, 陈静, 丁洪伟. 结合降噪卷积神经网络和条件生成对抗网络的图像双重盲降噪算法[J]. 计算机应用, 2021, 41(6): 1767-1774.
[7]	孙鹤立, 孙玉柱, 张晓云. 基于生成对抗网络的事件描述生成[J]. 计算机应用, 2021, 41(5): 1256-1261.
[8]	郭茂祖, 杨倩楠, 赵玲玲. 基于条件Wassertein生成对抗网络的图像生成[J]. 计算机应用, 2021, 41(5): 1432-1437.
[9]	欧莉莉, 邵峰晶, 孙仁诚, 隋毅. 基于半监督方法的脑梗死图像识别[J]. 计算机应用, 2021, 41(4): 1221-1226.
[10]	段友祥, 张含笑, 孙歧峰, 孙友凯. 基于拉普拉斯金字塔生成对抗网络的图像超分辨率重建算法[J]. 计算机应用, 2021, 41(4): 1020-1026.
[11]	李虹霞, 秦品乐, 闫寒梅, 曾建潮, 鲍骞月, 柴锐. 基于面部特征图对称的人脸正面化生成对抗网络算法[J]. 计算机应用, 2021, 41(3): 714-720.
[12]	张亚, 金鑫, 江倩, 李昕洁, 董云云, 姚绍文. 基于自动编码器的深度伪造图像检测方法[J]. 计算机应用, 2021, 41(10): 2985-2990.
[13]	邱耀儒, 孙为军, 黄永慧, 唐瑜祺, 张浩川, 吴俊鹏. 基于生成对抗网络联合时空模型的行人重识别方法[J]. 计算机应用, 2020, 40(9): 2493-2498.
[14]	张淑萍, 吴文, 万毅. 基于多阶段生成对抗网络的单幅图像阴影去除方法[J]. 计算机应用, 2020, 40(8): 2378-2385.
[15]	林静, 黄玉清, 李磊民. 基于球形矩匹配与特征判别的图像超分辨率重建[J]. 计算机应用, 2020, 40(8): 2345-2350.