Generative models of natural images have progressed towards high fidelity samples by the strong leveraging of scale. We attempt to carry this success to the field of video modeling by showing that large Generative Adversarial Networks trained on the complex Kinetics-600 dataset are able to produce video samples of substantially higher complexity than previous work. Our proposed network, Dual Video Discriminator GAN (DVD-GAN), scales to longer and higher resolution videos by leveraging a computationally efficient decomposition of its discriminator. We evaluate on the related tasks of video synthesis and video prediction, and achieve new state of the art Frechet Inception Distance on prediction for Kinetics-600, as well as state of the art Inception Score for synthesis on the UCF-101 dataset, alongside establishing a number of strong baselines on Kinetics-600.
翻译:自然图像的生成模型通过强大的规模杠杆作用,逐渐走向高度忠诚的样本。我们试图将这一成功推广到视频模型领域,展示在复杂的动因-600数据集方面受过培训的大型基因反转网络能够产生远比以往工作更复杂的视频样本。我们提议的网络,即双视频分解器GAN(DVD-GAN),通过利用其导体的计算高效分解,将比例提升到更长和更高分辨率的视频。我们评估了视频合成和视频预测的相关任务,并实现了关于动因-600预测的艺术Frechet受孕距离的新状态,以及在UCFF-101数据集上合成的艺术受孕分数状况,同时建立了若干关于动因-600的强大基线。