Building on recent advances in image generation, we present a fully data-driven approach to rendering markup into images. The approach is based on diffusion models, which parameterize the distribution of data using a sequence of denoising operations on top of a Gaussian noise distribution. We view the diffusion denoising process as a sequential decision making process, and show that it exhibits compounding errors similar to exposure bias issues in imitation learning problems. To mitigate these issues, we adapt the scheduled sampling algorithm to diffusion training. We conduct experiments on four markup datasets: mathematical formulas (LaTeX), table layouts (HTML), sheet music (LilyPond), and molecular images (SMILES). These experiments each verify the effectiveness of the diffusion process and the use of scheduled sampling to fix generation issues. These results also show that the markup-to-image task presents a useful controlled compositional setting for diagnosing and analyzing generative image models.
翻译:在图像生成最新进展的基础上,我们展示了一种完全由数据驱动的方法,将标记化成图像。这种方法以扩散模型为基础,在高山噪音分布的顶端使用一系列分解操作对数据分布进行参数化。我们认为扩散分解过程是一个相继的决策过程,并表明它显示出与模仿学习问题中的暴露偏差问题相似的复杂错误。为了减轻这些问题,我们调整了预定的抽样算法以进行扩散培训。我们实验了四个标记数据集:数学公式(LaTeX)、表布局(HTML)、表布局(LilyPond)和分子图像(SMILES)。这些实验都核查了扩散过程的有效性以及利用预定取样来固定生成问题。这些结果还表明,标记到图像任务为诊断和分析基因化图像模型提供了一种有用的控制性构件设置。