《计算机应用》唯一官方网站

• •    下一篇

基于扩散模型微调的高保真图像编辑

刘雨生,肖学中   

  1. 南京邮电大学
  • 收稿日期:2023-11-14 修回日期:2024-03-01 发布日期:2024-03-12 出版日期:2024-03-12
  • 通讯作者: 肖学中
  • 作者简介:刘雨生(2001—),男,江苏徐州人,硕士研究生,主要研究方向:计算机视觉、图像合成;肖学中(1972—),男, 江苏南京人,副教授,博士,主要研究方向:计算机图像学、计算机视觉、机器学习。

High-fidelity image editing based on fine-tuning of diffusion models

  • Received:2023-11-14 Revised:2024-03-01 Online:2024-03-12 Published:2024-03-12
  • Contact: Xue-Zhong XIAO
  • About author:LIU Yusheng, born in 2001, M. S. candidate. His research interests include computer vision, image compositing. XIAO Xuezhong, born in 1972, Ph. D., associate professor. His research interests include computer graphics, computer vision, machine learning.

摘要: 针对目前主流的图像编辑方法存在任务单一、操作不友好、保真度低等问题,提出一种基于扩散模型的对图像进
行高保真编辑的方法。该方法将目前主流的稳定扩散模型作为骨干网络,首先使用低秩适用(LoRA)方法对模型进行微调,使
模型能够更好重建原始图像;然后使用微调后的模型将图片与简单的提示词通过设计的框架进行推理,最终生成编辑后图像。
另外在上述方法基础上扩展提出了双层 U-net 结构用于特定需求的图像编辑任务以及作为视频合成的探索。与领先的方法
Imagic、DiffEdit、InstructPix2Pix 在 Tedbench 数据集上做了对比实验,结果显示方法能够对图像进行包括非刚性编辑的多种
编辑任务,可编辑性强。在学习感知块相似性(LPIPS)指数上比 Imagic 下降了 30.27%,表明方法具有更高的保真度。

关键词: 扩散模型, 图像编辑, 低秩适用, 模型微调, U-net

Abstract: Addressing the issues of task singularity, user-unfriendliness, and low fidelity in current mainstream image editing
methods, a diffusion model-based method for high-fidelity image editing was proposed. The method utilized the prevalent stable
diffusion model as the backbone network, initially fine-tuned using the Low Rank Adaptation (LoRA) method to better reconstruct the
original image. Subsequently, the refined model was employed to infer images with simple prompts through a designed framework,
ultimately generating edited images. Furthermore, a dual-layer U-net structure was extended from the aforementioned method for
specific image editing tasks and exploration as video synthesis. Comparative experiments with leading methods Imagic, DiffEdit, and
InstructPix2Pix on the Tedbench dataset demonstrated the method's capability to perform various editing tasks, including non-rigid
editing, with strong editability. The method exhibited a 30.27% decrease in Learned Perceptual Image Patch Similarity (LPIPS) index
compared to Imagic, indicating higher fidelity.

Key words: Diffusion model, Image editing, Low-Rank Adaptation (LoRA), Model fine-tuning, U-net

中图分类号: