Human-centric detail-enhanced virtual try-on method

doi:10.11772/j.issn.1001-9081.2025040475

Journal of Computer Applications

Received:2025-04-29 Revised:2025-06-14 Accepted:2025-06-23 Online:2025-06-26 Published:2025-06-26
Supported by:
Natural Science Foundation Program of Shanxi Province

以人为中心的细节增强虚拟试衣方法

邵培荣,蔺素珍,王彦博

中北大学计算机科学与技术学院，太原 030051

通讯作者: 蔺素珍
基金资助:
山西省基础研究计划资助项目

Abstract

Abstract: To address the limitations of current virtual try-on methods in adequately preserving local details of target garments, and when diffusion model were used for generation, the Variational AutoEncoder (VAE)’s mapping of input data to low-dimensional space leaded to loss of high-frequency details in models’ hands and faces, a human-centric detail-enhanced virtual try-on method was proposed. Firstly, the clothing-agnostic person map, pose map, and target garment were input into a geometric matching module to generate a coarsely warped garment. Then, a Garment Wrap Refinement (GWR) was constructed to enhance the detailed features of the coarsely warped garment. Subsequently, the warped garment, clothing-agnostic person map, and pose map were concatenated and fed into a UNet alongside textual features, where textual and visual features were fused to progressively generate a clear image through iterative denoising. A Mask Feature Connection (MFC) moduleincorporating a coordinate attention was introduced to precisely localize the model’s position and preserve high-frequency details in regions such as hands and faces, ensuring human-centric results. Finally, the outputs of the (MFC) module and the UNet were fused and decoded to produce the final try-on result. Experimental results demonstrate that the proposed method achieves a 1.41% improvement in the Structural Similarity Index Measure (SSIM) on the DressCode dataset, along with reductions of 7.32%, 31%, and 65% in the Learned Perceptual Image Patch Similarity (LPIPS), Fréchet Inception Distance (FID), and Kernel Inception Distance (KID) metrics, respectively, compared to the LADI-VTON method, these results indicate that the proposed method achieves superior performance in virtual try-on tasks.

Key words: virtual try-on, detail enhancement, coordinate attention, human centric, diffusion model

摘要： 针对当前虚拟试衣方法无法充分保留目标服装的局部细节问题，以及使用扩散模型生成试衣结果时，变分自编码器(VAE)会将输入数据映射到低维空间，从而导致模特手部和脸部高频细节特征丢失的问题，提出一种以人为中心的细节增强虚拟试衣方法。首先，将服装不可知的人体图、人体姿态图和目标服装输入几何匹配模块得到粗扭曲服装结果，再通过构建服装扭曲细化(GWR)模块，增强粗扭曲服装的细节特征；其次，将服装扭曲图、服装不可知的人体图以及人体姿态图等拼接后和文本特征作为UNet的输入，将文本特征与图像特征融合，通过去噪逐步生成清晰的图像；再次，构建掩码特征连接(MFC)模块，引入坐标注意力机制，更准确地定位模特的位置信息，保留模特的手部和脸部细节特征，实现以人为中心；最后，将MFC模块的输出与UNet的输出进行融合解码，得到最终的试衣结果。实验结果表明，与LADI-VTON方法相比，所提方法在Dress Code数据集上的结构相似度指数（SSIM）指标提升了1.41%，在感知相似度（LPIPS）、FID（Fréchet Inception Distance）、KID（Kernel Inception Distance）评价指标上分别降低了7.32%、31%、65%，以上结果验证了所提方法虚拟试衣效果更优。

关键词: 虚拟试衣, 细节增强, 坐标注意力机制, 以人为中心, 扩散模型

CLC Number:

中图分类号:TP391.41

邵培荣蔺素珍王彦博. 以人为中心的细节增强虚拟试衣方法[J]. 《计算机应用》唯一官方网站, DOI: 10.11772/j.issn.1001-9081.2025040475.

[1]	Benjie SHE, Shuzhi SU, Yanmin ZHU, Jian HUA, Chao WANG. Lightweight pose estimation network based on non-globally dependent integral regression [J]. Journal of Computer Applications, 2025, 45(3): 972-977.
[2]	Qiang LI, Shaoxiong BAI, Yuan XIONG, Wei YUAN. Privacy preserving localization of surveillance images based on large vision models [J]. Journal of Computer Applications, 2025, 45(3): 832-839.
[3]	Jiayang GUI, Shunji WANG, Zhengkang ZHOU, Jiashan TANG. Tunnel foreign object detection algorithm based on improved YOLOv8n [J]. Journal of Computer Applications, 2025, 45(2): 655-661.
[4]	Pengqi GAO, Heming HUANG, Yonghong FAN. Fusion of coordinate and multi-head attention mechanisms for interactive speech emotion recognition [J]. Journal of Computer Applications, 2024, 44(8): 2400-2406.
[5]	Chenyang LI, Long ZHANG, Qiusheng ZHENG, Shaohua QIAN. Multivariate controllable text generation based on diffusion sequences [J]. Journal of Computer Applications, 2024, 44(8): 2414-2420.
[6]	Xiaohui CHENG, Yuntian HUANG, Ruifang ZHANG. Lightweight infrared road scene detection model based on multiscale and weighted coordinate attention [J]. Journal of Computer Applications, 2024, 44(6): 1927-1934.
[7]	Jinsong XU, Ming ZHU, Zhiqiang LI, Shijie GUO. Location control method for generated objects by diffusion model with exciting and pooling attention [J]. Journal of Computer Applications, 2024, 44(4): 1093-1098.
[8]	Yusheng LIU, Xuezhong XIAO. High-fidelity image editing based on fine-tuning of diffusion model [J]. Journal of Computer Applications, 2024, 44(11): 3574-3580.
[9]	Qiumei ZHENG, Weiwei NIU, Fenghua WANG, Dan ZHAO. Dual-branch real-time semantic segmentation network based on detail enhancement [J]. Journal of Computer Applications, 2024, 44(10): 3058-3066.
[10]	Huan LIU, Lianghong WU, Lyu ZHANG, Liang CHEN, Bowen ZHOU, Hongqiang ZHANG. Leukocyte detection method based on twice-fusion-feature CenterNet [J]. Journal of Computer Applications, 2023, 43(8): 2602-2610.
[11]	Shengwei DUAN, Xinyu CHENG, Haozhou WANG, Fei WANG. Dam surface disease detection algorithm based on improved YOLOv5 [J]. Journal of Computer Applications, 2023, 43(8): 2619-2629.
[12]	Bona XUAN, Jin LI, Yafei SONG, Zexuan MA. Malicious code classification method based on improved MobileNetV2 [J]. Journal of Computer Applications, 2023, 43(7): 2217-2225.
[13]	Wenju LI, Gan ZHANG, Liu CUI, Wanghui CHU. Lightweight traffic sign recognition model based on coordinate attention [J]. Journal of Computer Applications, 2023, 43(2): 608-614.
[14]	Xinrong HU, Junyu ZHANG, Tao PENG, Junping LIU, Ruhan HE, Kai HE. Cascaded cross-domain feature fusion for virtual try-on [J]. Journal of Computer Applications, 2022, 42(4): 1269-1274.
[15]	YANG Shuxin, LIANG Wen, ZHU Kaili. Reverse influence maximization algorithm in social networks [J]. Journal of Computer Applications, 2020, 40(7): 1944-1949.