About author:XU Jinsong, born in 1999, M. S. candidate. His research interests include image processing. ZHU Ming, born in 1978, M. S., associate professor. His research interests include big data, artificial intelligence. LI Zhiqiang, born in 1999, M. S. candidate. His research interests include natural language processing. GUO Shijie, born in 1999, M. S. candidate. His research interests include natural language processing.
Supported by:
National Natural Science Foundation of China(62106069)
Jinsong XU, Ming ZHU, Zhiqiang LI, Shijie GUO. Location control method for generated objects by diffusion model with exciting and pooling attention[J]. Journal of Computer Applications, 2024, 44(4): 1093-1098.
SAHARIA C, CHAN W, SAXENA S, et al. Photorealistic text-to-image diffusion models with deep language understanding[J]. Advances in Neural Information Processing Systems, 2022, 35: 36479-36494. 10.1145/3528233.3530757
2
SAHARIA C, HO J, CHAN W, et al. Image super-resolution via iterative refinement[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 45(4): 4713-4726.
3
NICHOL A, DHARIWAL P, RAMESH A, et al. GLIDE: towards photorealistic image generation and editing with text-guided diffusion models[EB/OL]. (2022-03-08) [2023-05-10]. .
4
ROMBACH R, BLATTMANN A, LORENZ D, et al. High-resolution image synthesis with latent diffusion models[C]// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 10684-10695. 10.1109/cvpr52688.2022.01042
5
HO J, JAIN A, ABBEEL P. Denoising diffusion probabilistic models[J]. Advances in Neural Information Processing Systems, 2020, 33: 6840-6851. 10.48550/arXiv.2006.11239
6
DHARIWAL P, NICHOL A. Diffusion models beat GANs on image synthesis[J]. Advances in Neural Information Processing Systems, 2021, 34: 8780-8794. 10.48550/arXiv.2105.05233
7
ZHENG G, LI S, WANG H, et al. Entropy-driven sampling and training scheme for conditional diffusion generation[C]// Proceedings of the 17th European Conference on Computer Vision. Cham: Springer, 2022: 754-769. 10.1007/978-3-031-20047-2_43
8
ZHANG C, ZHANG C, ZHANG M, et al. Text-to-image diffusion model in generative AI: a survey[EB/OL]. (2023-03-14) [2023-04-02]. . 10.1109/tcsvt.2023.3307554/mm1
9
KAWAR B, ELAD M, ERMON S, et al. Denoising diffusion restoration models[J]. Advances in Neural Information Processing Systems, 2022, 35: 23593-23606.
10
LUGMAYR A, DANELLJAN M, ROMERO A, et al. RePaint: inpainting using denoising diffusion probabilistic models[C]// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 11461-11471. 10.1109/cvpr52688.2022.01117
11
MANSIMOV E, PARISOTTO E, BA J L, et al. Generating images from captions with attention[EB/OL]. (2016-02-29) [2023-05-10]. .
12
SCHUHMANN C, VENCU R, BEAUMONT R, et al. LAION-400M: open dataset of CLIP-filtered 400 million image-text pairs[EB/OL]. (2021-11-03) [2023-05-10]. .
13
SOHL-DICKSTEIN J, WEISS E, MAHESWARANATHAN N, et al. Deep unsupervised learning using nonequilibrium thermodynamics[C]// Proceedings of the 32nd International Conference on Machine Learning. New York: JMLR.org, 2015: 2256-2265. 10.48550/arXiv.1503.03585
14
SONG J, MENG C, ERMON S. Denoising diffusion implicit models[EB/OL]. (2022-10-05) [2023-05-10]. .
15
SONG Y, ERMON S. Generative modeling by estimating gradients of the data distribution[EB/OL]. (2020-10-10) [2023-05-10]. . 10.47743/asas-2020-2-614-542
16
HO J, SALIMANS T. Classifier-free diffusion guidance[EB/OL]. (2022-07-26) [2023-05-10]. .
17
LIU V, CHILTON L B. Design guidelines for prompt engineering text-to-image generative models[C]// Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. New York: ACM, 2022: 384. 10.1145/3491102.3501825
18
WITTEVEEN S, ANDREWS M. Investigating prompt engineering in diffusion models[EB/OL]. (2022-11-21) [2023-05-10]. .
19
RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[C]// Proceedings of the 38th International Conference on Machine Learning. New York: JMLR.org, 2021: 8748-8763.
20
HERTZ A, MOKADY R, TENENBAUM J, et al. Prompt-to-prompt image editing with cross attention control[EB/OL]. (2022-08-02) [2023-05-10]. .
21
CHEFER H, ALALUF Y, VINKER Y, et al. Attend-and-Excite: attention-based semantic guidance for text-to-image diffusion models[J]. ACM Transactions on Graphics, 2023, 42(4): 148. 10.5715/jnlp.6.7_1