《计算机应用》唯一官方网站 ›› 2024, Vol. 44 ›› Issue (11): 3574-3580.DOI: 10.11772/j.issn.1001-9081.2023111570
收稿日期:
2023-11-15
修回日期:
2024-03-01
接受日期:
2024-03-05
发布日期:
2024-03-12
出版日期:
2024-11-10
通讯作者:
肖学中
作者简介:
刘雨生(2001—),男,江苏徐州人,硕士研究生,主要研究方向:计算机视觉、图像合成
Received:
2023-11-15
Revised:
2024-03-01
Accepted:
2024-03-05
Online:
2024-03-12
Published:
2024-11-10
Contact:
Xuezhong XIAO
About author:
LIU Yusheng, born in 2001, M. S. candidate. His research interests include computer vision, image composition.
摘要:
针对目前主流的图像编辑方法存在任务单一、操作不友好、保真度低等问题,提出一种基于扩散模型对图像进行高保真编辑的方法。该方法将目前主流的稳定扩散模型作为骨干网络,首先使用低秩适用(LoRA)方法对模型进行微调,使模型能够更好地重建原始图像;其次,使用微调后的模型将图片与简单的提示词通过设计的框架进行推理,最终生成编辑后图像。另外,在上述方法基础上扩展提出了双层U-Net结构用于特定需求的图像编辑任务以及视频合成。与领先的方法Imagic、DiffEdit、InstructPix2Pix在Tedbench数据集上的对比实验结果显示:所提方法能够对图像进行包括非刚性编辑的多种编辑任务,可编辑性强;而且在学习感知块相似性(LPIPS)指数上比Imagic下降了30.38%,表明该方法具有更高的保真度。
中图分类号:
刘雨生, 肖学中. 基于扩散模型微调的高保真图像编辑[J]. 计算机应用, 2024, 44(11): 3574-3580.
Yusheng LIU, Xuezhong XIAO. High-fidelity image editing based on fine-tuning of diffusion model[J]. Journal of Computer Applications, 2024, 44(11): 3574-3580.
方法 | 非刚性编辑(60个) | 物体增加(21个) | 主体替换(11个) | 背景替换(8个) | 总任务(100个) |
---|---|---|---|---|---|
本文方法 | 58 | 21 | 11 | 8 | 98 |
Imagic | 56 | 20 | 11 | 8 | 95 |
Img2Img | 46 | 13 | 3 | 5 | 67 |
DiffEdit | 11 | 4 | 10 | 1 | 26 |
InstructPix2Pix | 12 | 12 | 10 | 6 | 40 |
表1 编辑结果有效数对比
Tab. 1 Comparison of significant figures of editing results
方法 | 非刚性编辑(60个) | 物体增加(21个) | 主体替换(11个) | 背景替换(8个) | 总任务(100个) |
---|---|---|---|---|---|
本文方法 | 58 | 21 | 11 | 8 | 98 |
Imagic | 56 | 20 | 11 | 8 | 95 |
Img2Img | 46 | 13 | 3 | 5 | 67 |
DiffEdit | 11 | 4 | 10 | 1 | 26 |
InstructPix2Pix | 12 | 12 | 10 | 6 | 40 |
方法 | CLIP Score | LPIPS |
---|---|---|
Imagic | 25.186 2 | 0.550 7 |
Img2Img | 23.885 5 | 0.634 5 |
本文方法 | 25.318 1 | 0.383 4 |
表 2 保真度的对比
Tab. 2 Comparison of fidelity
方法 | CLIP Score | LPIPS |
---|---|---|
Imagic | 25.186 2 | 0.550 7 |
Img2Img | 23.885 5 | 0.634 5 |
本文方法 | 25.318 1 | 0.383 4 |
1 | RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[C]// Proceedings of the 38th International Conference on Machine Learning. New York: JMLR.org, 2021: 8748-8763. |
2 | ZHANG R, ISOLA P, EFROS A A, et al. The unreasonable effectiveness of deep features as a perceptual metric[C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 586-595. |
3 | BROOKS T, HOLYNSKI A, EFROS A A. InstructPix2Pix: learning to follow image editing instructions[C]// Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 18392-18402. |
4 | HERTZ A, MOKADY R, TENENBAUM J, et al. Prompt-to-Prompt image editing with cross-attention control[EB/OL]. [2023-09-12].. |
5 | COUAIRON G, VERBEEK J, SCHWENK H, et al. DiffEdit: diffusion-based semantic image editing with mask guidance[EB/OL]. [2023-08-22].. |
6 | KAWAR B, ZADA S, LANG O, et al. Imagic: text-based real image editing with diffusion models[C]// Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 6007-6017. |
7 | GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[C]// Proceedings of the 27th International Conference on Neural Information Processing Systems — Volume 2. Cambridge: MIT Press, 2014: 2672-2680. |
8 | NICHOL A, DHARIWAL P. Improved denoising diffusion probabilistic models[C]// Proceedings of the 38th International Conference on Machine Learning. New York: JMLR.org, 2021: 8162-8171. |
9 | HO J, JAIN A, ABBEEL P. Denoising diffusion probabilistic models[C]// Proceedings of the 34th International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2020: 6840-6851. |
10 | LIU H, WAN Z, HUANG W, et al. PD-GAN: probabilistic diverse GAN for image inpainting[C]// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 9367-9376. |
11 | JING Y, YANG Y, FENG Z, et al. Neural style transfer: a review[J]. IEEE Transactions on Visualization and Computer Graphics, 2020, 26(11): 3365-3385. |
12 | ZHU J Y, PARK T, ISOLA P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 2223-2232. |
13 | PAN X, TEWARI A, LEIMKÜHLER T, et al. Drag your GAN: interactive point-based manipulation on the generative image manifold[C]// Proceedings of the 2023 ACM SIGGRAPH Conference. New York: ACM, 2023: No.78. |
14 | ABDAL R, ZHU P, MITRA N J, et al. StyleFlow: attribute-conditioned exploration of StyleGAN-generated images using conditional continuous normalizing flows[J]. ACM Transactions on Graphics, 2021, 40(3): No.21. |
15 | PATASHNIK O, WU Z, SHECHTMAN E, et al. StyleCLIP: text-driven manipulation of StyleGAN imagery[C]// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 2065-2074. |
16 | GAL R, PATASHNIK O, MARON H, et al. StyleGAN-nada: clip-guided domain adaptation of image generators[J]. ACM Transactions on Graphics, 2022, 41(4): 1-13. |
17 | XIA W, YANG Y, XUE J H, et al. TediGAN: text-guided diverse face image generation and manipulation[C]// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 2256-2265. |
18 | ABDAL R, ZHU P, FEMIANI J, et al. CLIP2StyleGAN: unsupervised extraction of StyleGAN edit directions[C]// Proceedings of the 2022 ACM SIGGRAPH Conference. New York: ACM, 2022: No.48. |
19 | CROWSON K, BIDERMAN S, KORNIS D, et al. VQGAN-CLIP: open domain image generation and editing with natural language guidance[C]// Proceedings of the 2022 European Conference on Computer Vision, LNCS 13697. Cham: Springer, 2022: 88-105. |
20 | ESSER P, ROMBACH R, OMMER B. Taming Transformers for high-resolution image synthesis[C]// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 12868-12878. |
21 | MOKADY R, TOV O, YAROM M, et al. Self-distilled styleGAN: towards generation from internet photos[C]// Proceedings of the 2022 ACM SIGGRAPH Conference. New York: ACM, 2022: No.50. |
22 | GONG S, LI M, FENG J, et al. DiffuSeq: sequence to sequence text generation with diffusion models[EB/OL]. [2023-10-12].. |
23 | RAMESH A, PAVLOV M, GOH G, et al. Zero-shot text-to-image generation[C]// Proceedings of the 38th International Conference on Machine Learning. New York: JMLR.org, 2021: 8821-8831. |
24 | ROMBACH R, BLATTMANN A, LORENZ D, et al. High-resolution image synthesis with latent diffusion models[C]// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 10674-10685. |
25 | ZHANG L, RAO A, AGRAWALA M. Adding conditional control to text-to-image diffusion models[C]// Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2023: 3813-3824. |
26 | LUGMAYR A, DANELLJAN M, ROMERO A, et al. RePaint: inpainting using denoising diffusion probabilistic models[C]// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 11451-11461. |
27 | SAHARIA C, CHAN W, CHANG H, et al. Palette: image-to-image diffusion models[C]// Proceedings of the 2022 ACM SIGGRAPH Conference. New York: ACM, 2022: No.15. |
28 | RUIZ N, LI Y, JAMPANI V, et al. DreamBooth: fine tuning text-to-image diffusion models for subject-driven generation[C]// Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 22500-22510. |
29 | SUN Z, ZHOU Y, HE H, et al. SGDiff: a style guided diffusion model for fashion synthesis[C]// Proceedings of the 31st ACM International Conference on Multimedia. New York: ACM, 2023: 8433-8442. |
30 | MENG C, HE Y, SONG Y, et al. SDEdit: guided image synthesis and editing with stochastic differential equations[EB/OL]. [2023-08-05].. |
31 | KIM G, KWON T, YE J C. DiffusionCLIP: text-guided diffusion models for robust image manipulation[C]// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 2416-2425. |
32 | HOU C, WEI G, CHEN Z. High-fidelity diffusion-based image editing[C]// Proceedings of the 38th AAAI Conference of Artificial Intelligence. Palo Alto, CA: AAAI Press, 2024: 2184-2192. |
33 | VALEVSKI D, KALMAN M, MOLAD E, et al. UniTune: text-driven image editing by fine tuning a diffusion model on a single image[J]. ACM Transactions on Graphics, 2023, 42(4): No.128. |
34 | AVRAHAMI O, LISCHINSKI D, FRIED O. Blended diffusion for text-driven editing of natural images[C]// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 18187-18197. |
35 | HU E, SHEN Y, WALLIS P, et al. LoRA: low-rank adaptation of large language models[EB/OL]. [2023-06-22].. |
36 | RONNEBERGER O, FISCHER P, BROX T. U-Net: convolutional networks for biomedical image segmentation[C]// Proceedings of the 2015 International Conference on Medical Image Computing and Computer-Assisted Intervention, LNCS 9351. Cham: Springer, 2015: 234-241. |
37 | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2017: 6000-6010. |
38 | SONG J, MENG C, ERMON S. Denoising diffusion implicit models[EB/OL]. [2023-11-25].. |
39 | CHAKRAVARTHI A, GURURAJA H S. Classifier-free guidance for Generative Adversarial Networks (GANs)[C]// Proceedings of the 2022 International Conference on Intelligent Computing and Communication, AISC 1447. Singapore: Springer, 2023: 217-232. |
40 | CAO M, WANG X, QI Z, et al. MasaCtrl: tuning-free mutual self-attention control for consistent image synthesis and editing[C]// Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2023: 22503-22513. |
[1] | 李晨阳, 张龙, 郑秋生, 钱少华. 基于扩散序列的多元可控文本生成[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2414-2420. |
[2] | 许立君, 黎辉, 刘祖阳, 陈侃松, 马为駽. 基于3D‑Ghost卷积神经网络的脑胶质瘤MRI图像分割算法3D‑GA‑Unet[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1294-1302. |
[3] | 徐劲松, 朱明, 李智强, 郭世杰. 基于激发和汇聚注意力的扩散模型生成对象的位置控制方法[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1093-1098. |
[4] | 周迪, 张自力, 陈佳, 胡新荣, 何儒汉, 张俊. 基于EfficientNetV2和物体上下文表示的胃癌图像分割方法[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2955-2962. |
[5] | 陈靖超, 徐树公, 丁友东. 基于字体字符属性引导的文本图像编辑方法[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1416-1421. |
[6] | 杨有, 张汝荟, 许鹏程, 康慷, 翟浩. 面向民国档案印章分割的改进U-Net[J]. 《计算机应用》唯一官方网站, 2023, 43(3): 943-948. |
[7] | 朱利安, 张鸿. 基于双分支条件生成对抗网络的非均匀图像去雾[J]. 《计算机应用》唯一官方网站, 2023, 43(2): 567-574. |
[8] | 张志昂, 廖光忠. 基于U-Net的多尺度特征增强视网膜血管分割算法[J]. 《计算机应用》唯一官方网站, 2023, 43(10): 3275-3281. |
[9] | 林荐壮, 杨文忠, 谭思翔, 周乐鑫, 陈丹妮. 融合滤波增强和反转注意力网络用于息肉分割[J]. 《计算机应用》唯一官方网站, 2023, 43(1): 265-272. |
[10] | 靳华中, 张修洋, 叶志伟, 张闻其, 夏小鱼. 基于近似U型网络结构的图像去噪模型[J]. 《计算机应用》唯一官方网站, 2022, 42(8): 2571-2577. |
[11] | 徐光柱, 林文杰, 陈莎, 匡婉, 雷帮军, 周军. U-Net与自适应阈值脉冲耦合神经网络相结合的眼底血管分割方法[J]. 《计算机应用》唯一官方网站, 2022, 42(3): 825-832. |
[12] | 吴奇文, 王建华, 郑翔, 冯居, 姜洪岩, 王昱博. 基于改进U-Net的水草图像分割方法[J]. 《计算机应用》唯一官方网站, 2022, 42(10): 3177-3183. |
[13] | 黄梨, 卢龙. 基于长距离依赖编码与深度残差U-Net的缺血性卒中病灶分割[J]. 计算机应用, 2021, 41(6): 1820-1827. |
[14] | 高海军, 曾祥银, 潘大志, 郑伯川. 基于U-Net改进模型的直肠肿瘤分割方法[J]. 计算机应用, 2020, 40(8): 2392-2397. |
[15] | 马金林, 魏萌, 马自萍. 基于深度迁移学习的肺结节分割方法[J]. 计算机应用, 2020, 40(7): 2117-2125. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||