Virtual try-on has garnered interest as a neural rendering benchmark task to evaluate complex object transfer and scene composition. Recent works in virtual clothing try-on feature a plethora of possible architectural and data representation choices. However, they present little clarity on quantifying the isolated visual effect of each choice, nor do they specify the hyperparameter details that are key to experimental reproduction. Our work, ShineOn, approaches the try-on task from a bottom-up approach and aims to shine light on the visual and quantitative effects of each experiment. We build a series of scientific experiments to isolate effective design choices in video synthesis for virtual clothing try-on. Specifically, we investigate the effect of different pose annotations, self-attention layer placement, and activation functions on the quantitative and qualitative performance of video virtual try-on. We find that DensePose annotations not only enhance face details but also decrease memory usage and training time. Next, we find that attention layers improve face and neck quality. Finally, we show that GELU and ReLU activation functions are the most effective in our experiments despite the appeal of newer activations such as Swish and Sine. We will release a well-organized code base, hyperparameters, and model checkpoints to support the reproducibility of our results. We expect our extensive experiments and code to greatly inform future design choices in video virtual try-on. Our code may be accessed at https://github.com/andrewjong/ShineOn-Virtual-Tryon.
翻译:虚拟试镜已引起人们的兴趣,因为这是一个神经基准任务,可以评估复杂的天体转移和场景构成。最近虚拟服装试镜的作品具有大量可能的建筑和数据代表选择。然而,这些作品在量化每种选择的孤立视觉效果方面不够清晰,也没有具体说明实验复制的关键超光谱细节。我们的工作,ShinaOn,从自下而上的方法接近试镜任务,目的是闪耀每个实验的视觉和数量效果。我们建立一系列科学实验,在虚拟服装试镜的视频合成中分离有效的设计选择。具体地说,我们调查不同配置说明、自我注意层的布置和激活功能对视频虚拟试镜的定量和定性性能的影响。我们发现登塞-Pose说明不仅强化了面貌细节,而且减少了记忆的使用和培训时间。我们发现关注层提高了面部和颈部的质量。最后,我们显示GELU和ReLU激活功能是我们实验中最有效的访问范围最强的实验,尽管有新激活的吸引力,例如Swish and Sine。我们将在虚拟试验点上发布一个完善的基础代码,我们将在我们的模拟和虚拟准则。我们将向我们的高级试验室公布结果。我们将大大地公布一个基础代码。