《计算机应用》唯一官方网站 ›› 2024, Vol. 44 ›› Issue (4): 1093-1098.DOI: 10.11772/j.issn.1001-9081.2023050634

• 人工智能 • 上一篇    

基于激发和汇聚注意力的扩散模型生成对象的位置控制方法

徐劲松, 朱明(), 李智强, 郭世杰   

  1. 湖北大学 计算机与信息工程学院,武汉 430062
  • 收稿日期:2023-05-23 修回日期:2023-09-12 接受日期:2023-09-28 发布日期:2023-10-17 出版日期:2024-04-10
  • 通讯作者: 朱明
  • 作者简介:徐劲松(1999—),男,湖北襄阳人,硕士研究生,主要研究方向:图像处理
    朱明(1978—),男,湖北武汉人,副教授,硕士,主要研究方向:大数据、人工智能 zm@hubu.edu.cn
    李智强(1999—),男,湖北咸宁人,硕士研究生,主要研究方向:自然语言处理
    郭世杰(1999—),男,河南南阳人,硕士研究生,主要研究方向:自然语言处理。
  • 基金资助:
    国家自然科学基金资助项目(62106069)

Location control method for generated objects by diffusion model with exciting and pooling attention

Jinsong XU, Ming ZHU(), Zhiqiang LI, Shijie GUO   

  1. College of Computer and Information Engineering,Hubei University,Wuhan Hubei 430062,China
  • Received:2023-05-23 Revised:2023-09-12 Accepted:2023-09-28 Online:2023-10-17 Published:2024-04-10
  • Contact: Ming ZHU
  • About author:XU Jinsong, born in 1999, M. S. candidate. His research interests include image processing.
    ZHU Ming, born in 1978, M. S., associate professor. His research interests include big data, artificial intelligence.
    LI Zhiqiang, born in 1999, M. S. candidate. His research interests include natural language processing.
    GUO Shijie, born in 1999, M. S. candidate. His research interests include natural language processing.
  • Supported by:
    National Natural Science Foundation of China(62106069)

摘要:

由于文本的模糊性和训练数据中位置信息的缺失,当前先进的扩散模型无法在文本提示的条件下准确控制生成对象在图像中的位置。针对这一问题,加入对象位置范围的空间条件,并基于U-Net中的交叉注意力图和图像空间布局的强关联性,提出一种注意力引导方法控制注意力图的生成,以控制对象的生成位置。具体地,基于稳定扩散(SD)模型,在U-Net层中的交叉注意力图生成的早期阶段,通过引入损失激发相应位置范围的高注意力值,减小范围外的平均注意力值,并在每一个去噪步骤中逐步优化隐空间中的噪声向量,从而控制注意力图的生成。实验结果表明,所提方法能明显控制一个或多个对象在生成图像中的位置,并在生成多个对象时能减少对象缺失、生成冗余对象和对象融合的现象。

关键词: 注意力图, 扩散模型, 位置控制, 文本引导, 图像生成

Abstract:

Due to the ambiguity of text and the lack of location information in training data, current state-of-the-art diffusion model cannot accurately control the locations of generated objects in the image under the condition of text prompts. To address this issue, a spatial condition of the object’s location range was introduced, and an attention-guided method was proposed based on the strong correlation between the cross-attention map in U-Net and the image spatial layout to control the generation of the attention map, thus controlling the locations of the generated objects. Specifically, based on the Stable Diffusion (SD) model, in the early stage of the generation of the cross-attention map in the U-Net layer, a loss was introduced to stimulate high attention values in the corresponding location range, and reduce the average attention value outside the range. The noise vector in the latent space was optimized step by step in each denoising step to control the generation of the attention map. Experimental results show that the proposed method can significantly control the locations of one or more objects in the generated image, and when generating multiple objects, it can reduce the phenomenon of object omission, redundant object generation, and object fusion.

Key words: attention map, diffusion model, location control, text guidance, image generation

中图分类号: