融合语义标签和噪声先验的图像生成

doi:10.11772/j.issn.1001-9081.2019101757

计算机应用 ›› 2020, Vol. 40 ›› Issue (5): 1431-1439.DOI: 10.11772/j.issn.1001-9081.2019101757

• 虚拟现实与多媒体计算 • 上一篇下一篇

融合语义标签和噪声先验的图像生成

张素素, 倪建成, 周子力, 侯杰

曲阜师范大学软件学院，山东曲阜 273165

收稿日期:2019-10-16 修回日期:2019-12-11 发布日期:2020-05-15 出版日期:2020-05-10
通讯作者: 倪建成(1971—)
作者简介:张素素(1997—)，女，山东菏泽人，硕士研究生，主要研究方向：计算机视觉、图像生成、深度学习；倪建成(1971—)，男，山东济宁人，教授，博士，CCF高级会员，主要研究方向：计算机视觉、机器学习、分布式计算；周子力(1973—)，男，山东菏泽人，副教授，博士，CCF高级会员，主要研究方向：智能信息处理、嵌入式智能、知识工程；侯杰(1996—)，女，山东济宁人，硕士研究生，主要研究方向：计算机视觉、深度学习。
基金资助:
国家自然科学基金青年科学基金资助项目（61601261）；山东省研究生教育质量提升计划项目（SDYY17136）；曲阜师范大学交叉学科研究项目(QFNUSKC291809120)。

Image generation based on semantic labels and noise prior

ZHANG Susu, NI Jiancheng, ZHOU Zili, HOU Jie

School of Software, Qufu Normal University, Qufu Shandong 273165, China

Received:2019-10-16 Revised:2019-12-11 Online:2020-05-15 Published:2020-05-10
Contact: NI Jiancheng, born in 1971, Ph. D., professor. His research interests include computer vision, machine learning, distributed computing.
About author:ZHANG Susu, born in 1997, M. S. candidate. Her research interests include computer vision, image generation, deep learning.NI Jiancheng, born in 1971, Ph. D., professor. His research interests include computer vision, machine learning, distributed computing.ZHOU Zili, born in 1973, Ph. D., associate professor. His research interests include intelligent information processing, embedded intelligence, knowledge engineering.HOU Jie, born in 1996, M. S. candidate. Her research interests include computer vision, deep learning.
Supported by:
This work is partially supported by the Youth Program of National Science Foundation of China (61601261), the Program of Graduate Education Quality Improvement of Shandong Province (SDYY17136), the Interdisciplinary Research Project of Qufu Normal University (QFNUSKC291809120).

摘要/Abstract

摘要：

针对现有生成模型难以直接从复杂语义标签生成高分辨率图像的问题，提出了融合语义标签和噪声先验的生成对抗网络(SLNP-GAN)。首先，直接输入语义标签（包含形状、位置和类别等信息），使用全局生成器对其进行编码，并结合噪声先验来学习粗粒度的全局属性，初步合成低分辨率图像；然后，基于注意力机制，使用局部细化生成器来查询低分辨率图像子区域对应的高分辨率子标签，获取细粒度信息，从而生成纹理清晰的复杂图像；最后，采用改进的引入动量的Adam算法(AMM)算法来优化对抗训练。实验结果表明，与现有方法text2img相比，所提方法的像素精确度（PA）在COCO_Stuff和ADE20K数据集上分别提高了23.73%和11.09%；相较于Adam算法，AMM算法收敛速度提升了约一倍，且损失值波幅较小。可见，SLNP-GAN能高效地获取全局特征和局部纹理，生成细粒度、高质量的图像。

关键词: 语义标签, 噪声先验, 注意力机制, 引入动量的Adam算法, 生成对抗网络

Abstract:

Existing generation models have difficulty in directly generating high-resolution images from complex semantic labels. Thus, a Generative Adversarial Network based on Semantic Labels and Noise Prior (SLNP-GAN) was proposed. Firstly, the semantic labels (including information of shape, position and category) were directly used as input, the global generator was used to encode them, the coarse-grained global attributes were learned by combining the noise prior, and the low-resolution images were generated. Then, with the attention mechanism, the local refined generator was used to query the high-resolution sub-labels corresponding to the sub-regions of the low-resolution images, and the fine-grained information was obtained, the complex images with clear textures were thus generated. Finally, the improved Adam with Momentum (AMM) algorithm was introduced to optimize the adversarial training. The experimental results show that, compared with the existing method text2img, the proposed method has the Pixel Accuracy (PA) increased by 23.73% and 11.09% respectively on COCO_Stuff and the ADE20K datasets; in comparison with the Adam algorithm, the AMM algorithm doubles the convergence speed with much smaller loss amplitude. It proves that SLNP-GAN can efficiently obtain global features as well as local textures and generate fine-grained high-quality images.

Key words: semantic label, noise prior, attention mechanism, Adam with Momentum (AMM) algorithm, Generative Adversarial Network (GAN)

中图分类号:

TP391.4

张素素, 倪建成, 周子力, 侯杰. 融合语义标签和噪声先验的图像生成[J]. 计算机应用, 2020, 40(5): 1431-1439.

ZHANG Susu, NI Jiancheng, ZHOU Zili, HOU Jie. Image generation based on semantic labels and noise prior[J]. Journal of Computer Applications, 2020, 40(5): 1431-1439.

参考文献

1 ZHANG H , XU T , LI H . StackGAN: text to photo-realistic image synthesis with stacked generative adversarial networks[C]// Proceedings of 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 5908-5916.
2 REED S , OORD A VAN DEN , KALCHBRENNER K , et al . Parallel multiscale autoregressive density estimation[C]// Proceedings of the 34th International Conference on Machine Learning. New York: JMLR.org, 2017: 2912-2921.
3 MANSIMOV E , PARISOTTO E , BA J L . Generating images from captions with attention[EB/OL]. [2019-07-28].https://arxiv.org/pdf/1511.02793.
4 CHEN Q , KOLTUN V . Photographic image synthesis with cascaded reﬁnement networks[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 1520-1529.
5 ISOLA P , ZHU J , ZHOU T , et al . Image-to-image translation with conditional adversarial networks[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision and Pattern Recognition, 2017: 5967-5976.
6 QIAO T , ZHANG J , XU D , et al . MirrorGAN: learning text-to-image generation by redescription[C]// Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019:1505-1514.
7 HONG S , YANG D , CHOI J , et al . Inferring semantic layout for hierarchical text-to-image synthesis[C]// Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018:7986-7994.
8 JOHNSON J , GUPTA A , LI F . Image generation from scene graphs[C]// Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 1219-1228.
9 LI Y , OUYANG W , ZHOU B , et al. Scene graph generation from objects , phrases and region captions[C]// Proceedings of 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 1270-1279.
10 郭丽丽,丁世飞 . 深度学习研究进展[J]. 计算机科学, 2015,42(5): 28-33. (GUO L L, DING S F. Research progress on deep learning[J]. Computer Science, 2015, 42(5): 28-33.)
11 刘波宁,翟东海 . 基于双鉴别网络的生成对抗网络图像修复方法[J].计算机应用, 2018, 38(12): 3557-3562, 3595. (LIU B N, ZHAI D H. Image completion method of generative adversarial networks based on two discrimination networks[J]. Journal of Computer Applications, 2018, 38(12): 3557-3562, 3595.)
12 XU T , ZHANG P , HUANG Q , et al . AttnGAN: fine-grained text to image generation with attentional generative adversarial networks[C]// Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 1316-1324.
13 MNIH V , HEESS N , GRAVES A , et al . Recurrent models of visual attention[C]// Proceedings of the 27th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2014:2204-2212.
14 WANG T , LIU M , ZHU J , et al . High-resolution image synthesis and semantic manipulation with conditional GANs[C]// Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 8798-8807.
15 ANDERSON P , HE X , BUEHLER C , et al . Bottom-up and top-down attention for image captioning and visual question answering[C]// Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 6077-6086
16 MA S , FU J , CHEN C W , et al . DA-GAN: instance-level image translation by deep attention generative adversarial networks[C]// Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 5657-5666.
17 LU J , YANG J , BATRA D , et al . Hierarchical question-image co-attention for visual question answering[EB/OL]. [2019-06-23].https://arxiv.org/pdf/1606.00061.pdf.
18 JOHNSON J , ALAHI A , LI F . Perceptual losses for real-time style transfer and super-resolution[C]// Proceedings of 2016 European Conference on Computer Vision, LNCS 9906. Cham: Springer, 2016: 694-711.
19 GATYS L A , ECKER A S , BETHGE M . Image style transfer using convolutional neural networks[C]// Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 2414-2423.
20 修春波,马云菲,潘肖楠 . 基于距离融合的图像特征点匹配方法[J].计算机应用, 2019, 39(11): 3158-3162. (XIU C B, MA Y F, PAN X N. Image feature point matching method based on distance fusion[J]. Journal of Computer Applications, 2019, 39(11): 3158-3162.)
21 LI Y , SNAVELY N , HUTTENLOCHER D P . Location recognition using prioritized feature matching[C]// Proceedings of 2010 European Conference on Computer Vision, LNCS 6312. Berlin: Springer, 2010: 791-804.
22 CAESAR H , UIJLINGS J , FERRARI V . COCO-stuff: thing and stuff classes in context[C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 1209-1218.
23 ZHOU B , ZHAO H , PUIG X , et al . Scene parsing through Ade20K dataset[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 5122-5130.

[1]	赵志强, 马培红, 黑新宏. 基于双重注意力机制的人群计数方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2886-2892.
[2]	秦璟, 秦志光, 李发礼, 彭悦恒. 基于概率稀疏自注意力神经网络的重性抑郁疾患诊断[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2970-2974.
[3]	李力铤, 华蓓, 贺若舟, 徐况. 基于解耦注意力机制的多变量时序预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2732-2738.
[4]	薛凯鹏, 徐涛, 廖春节. 融合自监督和多层交叉注意力的多模态情感分析网络[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2387-2392.
[5]	汪雨晴, 朱广丽, 段文杰, 李书羽, 周若彤. 基于交互注意力机制的心理咨询文本情感分类模型[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2393-2399.
[6]	高鹏淇, 黄鹤鸣, 樊永红. 融合坐标与多头注意力机制的交互语音情感识别[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2400-2406.
[7]	李钟华, 白云起, 王雪津, 黄雷雷, 林初俊, 廖诗宇. 基于图像增强的低照度人脸检测[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2588-2594.
[8]	莫尚斌, 王文君, 董凌, 高盛祥, 余正涛. 基于多路信息聚合协同解码的单通道语音增强[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2611-2617.
[9]	刘丽, 侯海金, 王安红, 张涛. 基于多尺度注意力的生成式信息隐藏算法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2102-2109.
[10]	徐松, 张文博, 王一帆. 基于时空信息的轻量视频显著性目标检测网络[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2192-2199.
[11]	李大海, 王忠华, 王振东. 结合空间域和频域信息的双分支低光照图像增强网络[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2175-2182.
[12]	魏文亮, 王阳萍, 岳彪, 王安政, 张哲. 基于光照权重分配和注意力的红外与可见光图像融合深度学习模型[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2183-2191.
[13]	熊武, 曹从军, 宋雪芳, 邵云龙, 王旭升. 基于多尺度混合域注意力机制的笔迹鉴别方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2225-2232.
[14]	李欢欢, 黄添强, 丁雪梅, 罗海峰, 黄丽清. 基于多尺度时空图卷积网络的交通出行需求预测[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2065-2072.
[15]	毛典辉, 李学博, 刘峻岭, 张登辉, 颜文婧. 基于并行异构图和序列注意力机制的中文实体关系抽取模型[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2018-2025.

融合语义标签和噪声先验的图像生成

Image generation based on semantic labels and noise prior

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics