计算机应用 ›› 2020, Vol. 40 ›› Issue (5): 1431-1439.DOI: 10.11772/j.issn.1001-9081.2019101757

• 虚拟现实与多媒体计算 • 上一篇    下一篇

融合语义标签和噪声先验的图像生成

张素素, 倪建成, 周子力, 侯杰   

  1. 曲阜师范大学 软件学院,山东 曲阜 273165
  • 收稿日期:2019-10-16 修回日期:2019-12-11 出版日期:2020-05-10 发布日期:2020-05-15
  • 通讯作者: 倪建成(1971—)
  • 作者简介:张素素(1997—),女,山东菏泽人,硕士研究生,主要研究方向:计算机视觉、图像生成、深度学习; 倪建成(1971—),男,山东济宁人,教授,博士,CCF高级会员,主要研究方向:计算机视觉、机器学习、分布式计算; 周子力(1973—),男,山东菏泽人,副教授,博士,CCF高级会员,主要研究方向:智能信息处理、嵌入式智能、知识工程; 侯杰(1996—),女,山东济宁人,硕士研究生,主要研究方向:计算机视觉、深度学习。
  • 基金资助:

    国家自然科学基金青年科学基金资助项目(61601261);山东省研究生教育质量提升计划项目(SDYY17136);曲阜师范大学交叉学科研究项目(QFNUSKC291809120)。

Image generation based on semantic labels and noise prior

ZHANG Susu, NI Jiancheng, ZHOU Zili, HOU Jie   

  1. School of Software, Qufu Normal University, Qufu Shandong 273165, China
  • Received:2019-10-16 Revised:2019-12-11 Online:2020-05-10 Published:2020-05-15
  • Contact: NI Jiancheng, born in 1971, Ph. D., professor. His research interests include computer vision, machine learning, distributed computing.
  • About author:ZHANG Susu, born in 1997, M. S. candidate. Her research interests include computer vision, image generation, deep learning.NI Jiancheng, born in 1971, Ph. D., professor. His research interests include computer vision, machine learning, distributed computing.ZHOU Zili, born in 1973, Ph. D., associate professor. His research interests include intelligent information processing, embedded intelligence, knowledge engineering.HOU Jie, born in 1996, M. S. candidate. Her research interests include computer vision, deep learning.
  • Supported by:

    This work is partially supported by the Youth Program of National Science Foundation of China (61601261), the Program of Graduate Education Quality Improvement of Shandong Province (SDYY17136), the Interdisciplinary Research Project of Qufu Normal University (QFNUSKC291809120).

摘要:

针对现有生成模型难以直接从复杂语义标签生成高分辨率图像的问题,提出了融合语义标签和噪声先验的生成对抗网络(SLNP-GAN)。首先,直接输入语义标签(包含形状、位置和类别等信息),使用全局生成器对其进行编码,并结合噪声先验来学习粗粒度的全局属性,初步合成低分辨率图像;然后,基于注意力机制,使用局部细化生成器来查询低分辨率图像子区域对应的高分辨率子标签,获取细粒度信息,从而生成纹理清晰的复杂图像;最后,采用改进的引入动量的Adam算法(AMM)算法来优化对抗训练。实验结果表明,与现有方法text2img相比,所提方法的像素精确度(PA)在COCO_Stuff和ADE20K数据集上分别提高了23.73%和11.09%;相较于Adam算法,AMM算法收敛速度提升了约一倍,且损失值波幅较小。可见,SLNP-GAN能高效地获取全局特征和局部纹理,生成细粒度、高质量的图像。

关键词: 语义标签, 噪声先验, 注意力机制, 引入动量的Adam算法, 生成对抗网络

Abstract:

Existing generation models have difficulty in directly generating high-resolution images from complex semantic labels. Thus, a Generative Adversarial Network based on Semantic Labels and Noise Prior (SLNP-GAN) was proposed. Firstly, the semantic labels (including information of shape, position and category) were directly used as input, the global generator was used to encode them, the coarse-grained global attributes were learned by combining the noise prior, and the low-resolution images were generated. Then, with the attention mechanism, the local refined generator was used to query the high-resolution sub-labels corresponding to the sub-regions of the low-resolution images, and the fine-grained information was obtained, the complex images with clear textures were thus generated. Finally, the improved Adam with Momentum (AMM) algorithm was introduced to optimize the adversarial training. The experimental results show that, compared with the existing method text2img, the proposed method has the Pixel Accuracy (PA) increased by 23.73% and 11.09% respectively on COCO_Stuff and the ADE20K datasets; in comparison with the Adam algorithm, the AMM algorithm doubles the convergence speed with much smaller loss amplitude. It proves that SLNP-GAN can efficiently obtain global features as well as local textures and generate fine-grained high-quality images.

Key words: semantic label, noise prior, attention mechanism, Adam with Momentum (AMM) algorithm, Generative Adversarial Network (GAN)

中图分类号: