基于扩散模型微调的高保真图像编辑

doi:10.11772/j.issn.1001-9081.2023111570

《计算机应用》唯一官方网站

• • 下一篇

基于扩散模型微调的高保真图像编辑

刘雨生,肖学中

南京邮电大学

收稿日期:2023-11-14 修回日期:2024-03-01 发布日期:2024-03-12 出版日期:2024-03-12
通讯作者: 肖学中
作者简介:刘雨生(2001—)，男，江苏徐州人，硕士研究生，主要研究方向：计算机视觉、图像合成；肖学中(1972—)，男，江苏南京人，副教授，博士，主要研究方向：计算机图像学、计算机视觉、机器学习。

High-fidelity image editing based on fine-tuning of diffusion models

Received:2023-11-14 Revised:2024-03-01 Online:2024-03-12 Published:2024-03-12
Contact: Xue-Zhong XIAO
About author:LIU Yusheng, born in 2001, M. S. candidate. His research interests include computer vision, image compositing. XIAO Xuezhong, born in 1972, Ph. D., associate professor. His research interests include computer graphics, computer vision, machine learning.

摘要/Abstract

摘要： 针对目前主流的图像编辑方法存在任务单一、操作不友好、保真度低等问题，提出一种基于扩散模型的对图像进
行高保真编辑的方法。该方法将目前主流的稳定扩散模型作为骨干网络，首先使用低秩适用(LoRA)方法对模型进行微调，使
模型能够更好重建原始图像；然后使用微调后的模型将图片与简单的提示词通过设计的框架进行推理，最终生成编辑后图像。
另外在上述方法基础上扩展提出了双层 U-net 结构用于特定需求的图像编辑任务以及作为视频合成的探索。与领先的方法
Imagic、DiffEdit、InstructPix2Pix 在 Tedbench 数据集上做了对比实验，结果显示方法能够对图像进行包括非刚性编辑的多种
编辑任务，可编辑性强。在学习感知块相似性(LPIPS)指数上比 Imagic 下降了 30.27%，表明方法具有更高的保真度。

关键词: 扩散模型, 图像编辑, 低秩适用, 模型微调, U-net

Abstract: Addressing the issues of task singularity, user-unfriendliness, and low fidelity in current mainstream image editing
methods, a diffusion model-based method for high-fidelity image editing was proposed. The method utilized the prevalent stable
diffusion model as the backbone network, initially fine-tuned using the Low Rank Adaptation (LoRA) method to better reconstruct the
original image. Subsequently, the refined model was employed to infer images with simple prompts through a designed framework,
ultimately generating edited images. Furthermore, a dual-layer U-net structure was extended from the aforementioned method for
specific image editing tasks and exploration as video synthesis. Comparative experiments with leading methods Imagic, DiffEdit, and
InstructPix2Pix on the Tedbench dataset demonstrated the method's capability to perform various editing tasks, including non-rigid
editing, with strong editability. The method exhibited a 30.27% decrease in Learned Perceptual Image Patch Similarity (LPIPS) index
compared to Imagic, indicating higher fidelity.

Key words: Diffusion model, Image editing, Low-Rank Adaptation (LoRA), Model fine-tuning, U-net

中图分类号:

TP391. 41

刘雨生肖学中. 基于扩散模型微调的高保真图像编辑[J]. 计算机应用, DOI: 10.11772/j.issn.1001-9081.2023111570.

[1]	周迪, 张自力, 陈佳, 胡新荣, 何儒汉, 张俊. 基于EfficientNetV2和物体上下文表示的胃癌图像分割方法[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2955-2962.
[2]	陈靖超, 徐树公, 丁友东. 基于字体字符属性引导的文本图像编辑方法[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1416-1421.
[3]	杨有, 张汝荟, 许鹏程, 康慷, 翟浩. 面向民国档案印章分割的改进U-Net[J]. 《计算机应用》唯一官方网站, 2023, 43(3): 943-948.
[4]	朱利安, 张鸿. 基于双分支条件生成对抗网络的非均匀图像去雾[J]. 《计算机应用》唯一官方网站, 2023, 43(2): 567-574.
[5]	张志昂, 廖光忠. 基于U-Net的多尺度特征增强视网膜血管分割算法[J]. 《计算机应用》唯一官方网站, 2023, 43(10): 3275-3281.
[6]	林荐壮, 杨文忠, 谭思翔, 周乐鑫, 陈丹妮. 融合滤波增强和反转注意力网络用于息肉分割[J]. 《计算机应用》唯一官方网站, 2023, 43(1): 265-272.
[7]	靳华中, 张修洋, 叶志伟, 张闻其, 夏小鱼. 基于近似U型网络结构的图像去噪模型[J]. 《计算机应用》唯一官方网站, 2022, 42(8): 2571-2577.
[8]	徐光柱, 林文杰, 陈莎, 匡婉, 雷帮军, 周军. U-Net与自适应阈值脉冲耦合神经网络相结合的眼底血管分割方法[J]. 《计算机应用》唯一官方网站, 2022, 42(3): 825-832.
[9]	吴奇文, 王建华, 郑翔, 冯居, 姜洪岩, 王昱博. 基于改进U-Net的水草图像分割方法[J]. 《计算机应用》唯一官方网站, 2022, 42(10): 3177-3183.
[10]	黄梨, 卢龙. 基于长距离依赖编码与深度残差U-Net的缺血性卒中病灶分割[J]. 计算机应用, 2021, 41(6): 1820-1827.
[11]	高海军, 曾祥银, 潘大志, 郑伯川. 基于U-Net改进模型的直肠肿瘤分割方法[J]. 计算机应用, 2020, 40(8): 2392-2397.
[12]	石陆魁, 马红祺, 张朝宗, 樊世燕. 基于改进残差结构的肺结节检测方法[J]. 计算机应用, 2020, 40(7): 2110-2116.
[13]	马金林, 魏萌, 马自萍. 基于深度迁移学习的肺结节分割方法[J]. 计算机应用, 2020, 40(7): 2117-2125.
[14]	魏小娜, 邢嘉祺, 王振宇, 王颖珊, 石洁, 赵地, 汪红志. 基于改进U-Net的关节滑膜磁共振图像的分割[J]. 计算机应用, 2020, 40(11): 3340-3345.
[15]	潘沛克, 王艳, 罗勇, 周激流. 基于U-net模型的全自动鼻咽肿瘤MR图像分割[J]. 计算机应用, 2019, 39(4): 1183-1188.

基于扩散模型微调的高保真图像编辑

High-fidelity image editing based on fine-tuning of diffusion models

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics