VIP内容

虽然变分自编码器(VAEs)代表了一个广泛的有影响力的深度生成模型,但潜在的能量函数的许多方面仍然知之甚少。特别是,一般认为高斯编码器/解码器的假设降低了VAEs生成真实样本的有效性。在这方面,我们严格地分析VAE目标,区分哪些情况下这个信念是真实的,哪些情况下不是真实的。然后我们利用相应的见解来开发一个简单的VAE增强,不需要额外的hyperparameters或敏感的调优。在数量上,这个提议产生了清晰的样本和稳定的FID分数,这些分数实际上与各种GAN模型相竞争,同时保留了原始VAE架构的理想属性。这项工作的一个简短版本将出现在ICLR 2019年会议记录(Dai和Wipf, 2019)上。我们模型的代码在这个https URL TwoStageVAE中可用。

成为VIP会员查看完整内容
0
19

最新内容

It is increasingly considered that human speech perception and production both rely on articulatory representations. In this paper, we investigate whether this type of representation could improve the performances of a deep generative model (here a variational autoencoder) trained to encode and decode acoustic speech features. First we develop an articulatory model able to associate articulatory parameters describing the jaw, tongue, lips and velum configurations with vocal tract shapes and spectral features. Then we incorporate these articulatory parameters into a variational autoencoder applied on spectral features by using a regularization technique that constraints part of the latent space to follow articulatory trajectories. We show that this articulatory constraint improves model training by decreasing time to convergence and reconstruction loss at convergence, and yields better performance in a speech denoising task.

0
0
下载
预览

最新论文

It is increasingly considered that human speech perception and production both rely on articulatory representations. In this paper, we investigate whether this type of representation could improve the performances of a deep generative model (here a variational autoencoder) trained to encode and decode acoustic speech features. First we develop an articulatory model able to associate articulatory parameters describing the jaw, tongue, lips and velum configurations with vocal tract shapes and spectral features. Then we incorporate these articulatory parameters into a variational autoencoder applied on spectral features by using a regularization technique that constraints part of the latent space to follow articulatory trajectories. We show that this articulatory constraint improves model training by decreasing time to convergence and reconstruction loss at convergence, and yields better performance in a speech denoising task.

0
0
下载
预览
父主题
Top