《计算机应用》唯一官方网站 ›› 2026, Vol. 46 ›› Issue (3): 915-923.DOI: 10.11772/j.issn.1001-9081.2025040475

• 多媒体计算与计算机仿真 • 上一篇    下一篇

以人为中心的细节增强虚拟试衣方法

邵培荣, 蔺素珍(), 王彦博   

  1. 中北大学 计算机科学与技术学院,太原 030051
  • 收稿日期:2025-04-30 修回日期:2025-06-14 接受日期:2025-06-23 发布日期:2025-06-26 出版日期:2026-03-10
  • 通讯作者: 蔺素珍
  • 作者简介:邵培荣(2001—),女,山西运城人,硕士研究生,CCF会员,主要研究方向:虚拟试衣、图像处理
    王彦博(1984—),男,山西太原人,讲师,博士,主要研究方向:图像编辑、虚拟试衣。
  • 基金资助:
    山西省自然科学基金资助项目(202303021211147)

Human-centric detail-enhanced virtual try-on method

Peirong SHAO, Suzhen LIN(), Yanbo WANG   

  1. College of Computer Science and Technology,North University of China,Taiyuan Shanxi 030051,China
  • Received:2025-04-30 Revised:2025-06-14 Accepted:2025-06-23 Online:2025-06-26 Published:2026-03-10
  • Contact: Suzhen LIN
  • About author:SHAO Peirong, born in 2001, M. S. candidate. Her research interests include virtual try-on, image processing.
    WANG Yanbo, born in 1984, Ph. D., lecturer. His research interests include image editing, virtual try-on.
  • Supported by:
    Natural Science Foundation of Shanxi Province(202303021211147)

摘要:

针对当前虚拟试衣方法无法充分保留目标服装的局部细节的问题,以及使用扩散模型生成试衣结果时,变分自编码器(VAE)会将输入数据映射到低维空间,从而导致模特手部和脸部高频细节特征丢失的问题,提出一种以人为中心的细节增强虚拟试衣方法。首先,将服装不可知的人体图、人体姿态图和目标服装输入几何匹配模块(GMM)以得到粗扭曲服装结果;其次,构建服装扭曲细化(GWR)模块增强粗扭曲服装的细节特征;再次,将服装扭曲图、服装不可知的人体图以及人体姿态图等拼接后和文本特征作为UNet的输入,融合文本特征与图像特征通过去噪逐步生成清晰的图像;继次,构建掩码特征连接(MFC)模块,引入坐标注意力机制,更准确地定位模特的位置信息,保留模特手部和脸部的高频细节特征,实现以人为中心的结果;最后,将MFC模块的输出与UNet的输出进行融合解码,得到最终的试衣结果。实验结果表明,与LADI-VTON(LAtent DIffusion-Virtual Try-ON)方法相比,所提方法在Dress Code数据集上的结构相似度指数(SSIM)指标提升了1.41%,在感知相似度(LPIPS)、FID(Fréchet Inception Distance)和KID(Kernel Inception Distance)指标上分别降低了7.32%、31.03%和64.56%,验证了所提方法的虚拟试衣效果更优。

关键词: 虚拟试衣, 细节增强, 坐标注意力机制, 以人为中心, 扩散模型

Abstract:

To address the limitations of current virtual try-on methods in preserving local details of target garments adequately, and the problem that when diffusion model is used for generation, the Variational AutoEncoder (VAE)'s mapping of input data to low-dimensional space leads to loss of high-frequency detailed features in model’s hands and face, a human-centric detail-enhanced virtual try-on method was proposed. Firstly, the clothing-agnostic human body map, human pose map, and target garment were input into a Geometric Matching Module (GMM) to generate a coarsely warped garment result. Secondly, a Garment Wrap Refinement (GWR) module was constructed to enhance the detailed features of the coarsely warped garment. Thirdly, the warped garment map, clothing-agnostic human body map, and human pose map were concatenated and fed into a UNet with textual features, and textual and image features were fused to generate a clear image progressively through denoising. Fourthly, a Mask Feature Connection (MFC) module was constructed, and a coordinate attention was introduced, so as to localize the model’s position more accurately and preserve high-frequency detailed features in hands and face, thereby ensuring human-centric results. Finally, the output of MFC module and UNet were fused and decoded to obtain the final try-on results. Experimental results demonstrate that the proposed method achieves a 1.41% improvement in Structural Similarity Index Measure (SSIM) metric on the Dress Code dataset, along with reductions of 7.32%, 31.03%, and 64.56% in Learned Perceptual Image Patch Similarity (LPIPS), FID (Fréchet Inception Distance), and KID (Kernel Inception Distance) metrics, respectively, compared to the LADI-VTON (LAtent DIffusion-Virtual Try-ON) method, verifying that the proposed method achieves superior performance in virtual try-on.

Key words: virtual try-on, detail enhancement, coordinate attention mechanism, human-centric, diffusion model

中图分类号: