HF-VTON：通过几何与语义一致性对齐实现高保真虚拟试穿 (HF-VTON: High-Fidelity Virtual Try-On via Consistent Geometric and Semantic Alignment)

Virtual try-on technology has become increasingly important in the fashion and retail industries, enabling the generation of high-fidelity garment images that adapt seamlessly to target human models. While existing methods have achieved notable progress, they still face significant challenges in maintaining consistency across different poses. Specifically, geometric distortions lead to a lack of spatial consistency, mismatches in garment structure and texture across poses result in semantic inconsistency, and the loss or distortion of fine-grained details diminishes visual fidelity. To address these challenges, we propose HF-VTON, a novel framework that ensures high-fidelity virtual try-on performance across diverse poses. HF-VTON consists of three key modules: (1) the Appearance-Preserving Warp Alignment Module (APWAM), which aligns garments to human poses, addressing geometric deformations and ensuring spatial consistency; (2) the Semantic Representation and Comprehension Module (SRCM), which captures fine-grained garment attributes and multi-pose data to enhance semantic representation, maintaining structural, textural, and pattern consistency; and (3) the Multimodal Prior-Guided Appearance Generation Module (MPAGM), which integrates multimodal features and prior knowledge from pre-trained models to optimize appearance generation, ensuring both semantic and geometric consistency. Additionally, to overcome data limitations in existing benchmarks, we introduce the SAMP-VTONS dataset, featuring multi-pose pairs and rich textual annotations for a more comprehensive evaluation. Experimental results demonstrate that HF-VTON outperforms state-of-the-art methods on both VITON-HD and SAMP-VTONS, excelling in visual fidelity, semantic consistency, and detail preservation.

翻译：虚拟试穿技术在时尚与零售领域日益重要，能够生成高保真服装图像并使其无缝适配目标人体模型。尽管现有方法已取得显著进展，但在保持不同姿态间一致性方面仍面临重大挑战。具体而言，几何畸变导致空间一致性缺失，姿态间服装结构与纹理的错位引发语义不一致，而细粒度细节的丢失或畸变则降低了视觉保真度。为应对这些挑战，我们提出HF-VTON——一种确保跨多样姿态高保真虚拟试穿性能的新型框架。HF-VTON包含三个关键模块：（1）外观保持形变对齐模块（APWAM），将服装与人体姿态对齐，解决几何变形并确保空间一致性；（2）语义表征与理解模块（SRCM），捕获细粒度服装属性与多姿态数据以增强语义表征，维持结构、纹理与图案一致性；（3）多模态先验引导外观生成模块（MPAGM），整合多模态特征与预训练模型的先验知识以优化外观生成，确保语义与几何一致性。此外，为克服现有基准数据局限，我们构建了SAMP-VTONS数据集，包含多姿态配对样本与丰富文本标注以实现更全面评估。实验结果表明，HF-VTON在VITON-HD和SAMP-VTONS数据集上均优于现有先进方法，在视觉保真度、语义一致性与细节保持方面表现卓越。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日