Journal of Computer Applications ›› 2020, Vol. 40 ›› Issue (12): 3651-3657.DOI: 10.11772/j.issn.1001-9081.2020040522

• Virtual reality and multimedia computing • Previous Articles     Next Articles

Semantic face image inpainting based on U-Net with dense blocks

YANG Wenxia1, WANG Meng2, ZHANG Liang1   

  1. 1. School of Science, Wuhan University of Technology, Wuhan Hubei 430070, China;
    2. Tus College of Digit, Guangxi University of Science and Technology, Liuzhou Guangxi 545006, China
  • Received:2020-04-23 Revised:2020-06-23 Online:2020-12-10 Published:2020-07-30
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61573012), the China Scholarship Council (201906955038), the Science and Technology Program of Liuzhou (2018DH10503).

基于密集连接块U-Net的语义人脸图像修复

杨文霞1, 王萌2, 张亮1   

  1. 1. 武汉理工大学 理学院, 武汉 430070;
    2. 广西科技大学 启迪数字学院, 广西 柳州 545006
  • 通讯作者: 王萌(1979-),男,湖北武汉人,副教授,硕士,主要研究方向:自然语言理解。mwang007@gxust.edu.cn
  • 作者简介:杨文霞(1978-),女,湖北天门人,副教授,博士,主要研究方向:数字图像处理、模式识别;张亮(1977-),男,湖北武汉人,教授,博士,主要研究方向:偏微分方程的能控性与能稳性
  • 基金资助:
    国家自然科学基金资助项目(61573012);国家留学基金资助项目(201906955038);柳州科技计划项目(2018DH10503)。

Abstract: When the areas to be inpainted in the face image are large, there are some visual defects caused by the inpainting of the existing methods, such as unreasonable image semantic understanding and incoherent boundary. To solve this problem, an end-to-end image inpainting model of U-Net structure based on dense blocks was proposed to achieve the inpainting of semantic face of any mask. Firstly, the idea of generative adversarial network was adopted. In the generator, the convolutional layers in U-Net were replaced with dense blocks to capture the semantic information of the missing regions of the image and to make sure the features of the previous layers were reused. Then, the skip connections were adopted to reduce the information loss caused by the down-sampling, so as to extract the semantics of the missing regions. Finally, by introducing the joint loss function combining adversarial loss, content loss and local Total Variation (TV) loss to train the generator, the visual consistency between the inpainted boundary and the surrounding real image was ensured, and Hinge loss was used to train the discriminator. The proposed model was compared with Globally and Locally Consistent image completion(GLC),Deep Fusion(DF) and Gated Convolution(GC) on CelebA-HQ face dataset. Experimental results show that the proposed model can effectively extract the semantic information of face images, and its inpainting results have the boundaries with natural transition and clear local details. Compared with the second-best GC, the proposed model has the Structure SIMilarity index (SSIM) and Peak Signal-to-Noise Ratio (PSNR) increased by 5.68% and 7.87% respectively, while the Frechet Inception Distance (FID) decreased by 7.86% for the central masks; and has the SSIM and PSNR increased by 7.06% and 4.80% respectively while the FID decreased by 6.85% for the random masks.

Key words: semantic image inpainting, generative adversarial network, dense block, loss function, local Total Variation (TV), encoder-decoder

摘要: 针对人脸图像在待修复缺损面积较大时,现有方法的修复存在图像语义理解不合理、边界不连贯等视觉瑕疵的问题,提出基于密集连接块的U-Net结构的端到端图像修复模型,以实现对任意模板的语义人脸图像的修复。首先,采用生成对抗网络思想,生成器采用密集连接块代替U-Net中的普通卷积模块,以捕捉图像中缺损部分的语义信息并确保前面层的特征被再利用;然后,使用跳连接以减少通过下采样而造成的信息损失,从而提取图像缺损区域的语义;最后,通过引入对抗损失、内容损失和局部总变分(TV)损失这三者的联合损失函数来训练生成器,确保了修复边界和周围真实图像的视觉一致,并通过Hinge损失来训练判别器。所提模型和GLC、DF、门控卷积(GC)在人脸数据集CelebA-HQ上进行了对比。实验结果表明,所提模型能有效提取人脸图像语义信息,修复结果具有自然过渡的边界和清晰的局部细节。相较性能第二的GC,所提模型对中心模板修复的结构相似性(SSIM)和峰值信噪比(PSNR)分别提高了5.68%和7.87%,Frechet Inception距离(FID)降低了7.86%;对随机模板修复的SSIM和PSNR分别提高了7.06%和4.80%,FID降低了6.85%。

关键词: 语义图像修复, 生成对抗网络, 密集连接块, 损失函数, 局部总变分, 编码器-解码器

CLC Number: