Journal of Computer Applications
Next Articles
Received:
Revised:
Accepted:
Online:
Published:
Supported by:
邵培荣,蔺素珍,王彦博
通讯作者:
基金资助:
Abstract: To address the limitations of current virtual try-on methods in adequately preserving local details of target garments, and when diffusion model were used for generation, the Variational AutoEncoder (VAE)’s mapping of input data to low-dimensional space leaded to loss of high-frequency details in models’ hands and faces, a human-centric detail-enhanced virtual try-on method was proposed. Firstly, the clothing-agnostic person map, pose map, and target garment were input into a geometric matching module to generate a coarsely warped garment. Then, a Garment Wrap Refinement (GWR) was constructed to enhance the detailed features of the coarsely warped garment. Subsequently, the warped garment, clothing-agnostic person map, and pose map were concatenated and fed into a UNet alongside textual features, where textual and visual features were fused to progressively generate a clear image through iterative denoising. A Mask Feature Connection (MFC) moduleincorporating a coordinate attention was introduced to precisely localize the model’s position and preserve high-frequency details in regions such as hands and faces, ensuring human-centric results. Finally, the outputs of the (MFC) module and the UNet were fused and decoded to produce the final try-on result. Experimental results demonstrate that the proposed method achieves a 1.41% improvement in the Structural Similarity Index Measure (SSIM) on the DressCode dataset, along with reductions of 7.32%, 31%, and 65% in the Learned Perceptual Image Patch Similarity (LPIPS), Fréchet Inception Distance (FID), and Kernel Inception Distance (KID) metrics, respectively, compared to the LADI-VTON method, these results indicate that the proposed method achieves superior performance in virtual try-on tasks.
Key words: virtual try-on, detail enhancement, coordinate attention, human centric, diffusion model
摘要: 针对当前虚拟试衣方法无法充分保留目标服装的局部细节问题,以及使用扩散模型生成试衣结果时,变分自编码器(VAE)会将输入数据映射到低维空间,从而导致模特手部和脸部高频细节特征丢失的问题,提出一种以人为中心的细节增强虚拟试衣方法。首先,将服装不可知的人体图、人体姿态图和目标服装输入几何匹配模块得到粗扭曲服装结果,再通过构建服装扭曲细化(GWR)模块,增强粗扭曲服装的细节特征;其次,将服装扭曲图、服装不可知的人体图以及人体姿态图等拼接后和文本特征作为UNet的输入,将文本特征与图像特征融合,通过去噪逐步生成清晰的图像;再次,构建掩码特征连接(MFC)模块,引入坐标注意力机制,更准确地定位模特的位置信息,保留模特的手部和脸部细节特征,实现以人为中心;最后,将MFC模块的输出与UNet的输出进行融合解码,得到最终的试衣结果。实验结果表明,与LADI-VTON方法相比,所提方法在Dress Code数据集上的结构相似度指数(SSIM)指标提升了1.41%,在感知相似度(LPIPS)、FID(Fréchet Inception Distance)、KID(Kernel Inception Distance)评价指标上分别降低了7.32%、31%、65%,以上结果验证了所提方法虚拟试衣效果更优。
关键词: 虚拟试衣, 细节增强, 坐标注意力机制, 以人为中心, 扩散模型
CLC Number:
中图分类号:TP391.41
邵培荣 蔺素珍 王彦博. 以人为中心的细节增强虚拟试衣方法[J]. 《计算机应用》唯一官方网站, DOI: 10.11772/j.issn.1001-9081.2025040475.
0 / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2025040475