Virtual try-on technology has become increasingly important in the fashion and retail industries, enabling the generation of high-fidelity garment images that adapt seamlessly to target human models. While existing methods have achieved notable progress, they still face significant challenges in maintaining consistency across different poses. Specifically, geometric distortions lead to a lack of spatial consistency, mismatches in garment structure and texture across poses result in semantic inconsistency, and the loss or distortion of fine-grained details diminishes visual fidelity. To address these challenges, we propose HF-VTON, a novel framework that ensures high-fidelity virtual try-on performance across diverse poses. HF-VTON consists of three key modules: (1) the Appearance-Preserving Warp Alignment Module (APWAM), which aligns garments to human poses, addressing geometric deformations and ensuring spatial consistency; (2) the Semantic Representation and Comprehension Module (SRCM), which captures fine-grained garment attributes and multi-pose data to enhance semantic representation, maintaining structural, textural, and pattern consistency; and (3) the Multimodal Prior-Guided Appearance Generation Module (MPAGM), which integrates multimodal features and prior knowledge from pre-trained models to optimize appearance generation, ensuring both semantic and geometric consistency. Additionally, to overcome data limitations in existing benchmarks, we introduce the SAMP-VTONS dataset, featuring multi-pose pairs and rich textual annotations for a more comprehensive evaluation. Experimental results demonstrate that HF-VTON outperforms state-of-the-art methods on both VITON-HD and SAMP-VTONS, excelling in visual fidelity, semantic consistency, and detail preservation.
翻译:虚拟试穿技术在时尚与零售领域日益重要,能够生成高保真服装图像并使其无缝适配目标人体模型。尽管现有方法已取得显著进展,但在保持不同姿态间一致性方面仍面临重大挑战。具体而言,几何畸变导致空间一致性缺失,姿态间服装结构与纹理的错位引发语义不一致,而细粒度细节的丢失或畸变则降低了视觉保真度。为应对这些挑战,我们提出HF-VTON——一种确保跨多样姿态高保真虚拟试穿性能的新型框架。HF-VTON包含三个关键模块:(1)外观保持形变对齐模块(APWAM),将服装与人体姿态对齐,解决几何变形并确保空间一致性;(2)语义表征与理解模块(SRCM),捕获细粒度服装属性与多姿态数据以增强语义表征,维持结构、纹理与图案一致性;(3)多模态先验引导外观生成模块(MPAGM),整合多模态特征与预训练模型的先验知识以优化外观生成,确保语义与几何一致性。此外,为克服现有基准数据局限,我们构建了SAMP-VTONS数据集,包含多姿态配对样本与丰富文本标注以实现更全面评估。实验结果表明,HF-VTON在VITON-HD和SAMP-VTONS数据集上均优于现有先进方法,在视觉保真度、语义一致性与细节保持方面表现卓越。