Discrete state space diffusion models have shown significant advantages in applications involving discrete data, such as text and image generation. It has also been observed that their performance is highly sensitive to the choice of rate matrices, particularly between uniform and absorbing rate matrices. While empirical results suggest that absorbing rate matrices often yield better generation quality compared to uniform rate matrices, existing theoretical works have largely focused on the uniform rate matrices case. Notably, convergence guarantees and error analyses for absorbing diffusion models are still missing. In this work, we provide the first finite-time error bounds and convergence rate analysis for discrete diffusion models using absorbing rate matrices. We begin by deriving an upper bound on the KL divergence of the forward process, introducing a surrogate initialization distribution to address the challenge posed by the absorbing stationary distribution, which is a singleton and causes the KL divergence to be ill-defined. We then establish the first convergence guarantees for both the $\tau$-leaping and uniformization samplers under absorbing rate matrices, demonstrating improved rates over their counterparts using uniform rate matrices. Furthermore, under suitable assumptions, we provide convergence guarantees without early stopping. Our analysis introduces several new technical tools to address challenges unique to absorbing rate matrices. These include a Jensen-type argument for bounding forward process convergence, novel techniques for bounding absorbing score functions, and a non-divergent upper bound on the score near initialization that removes the need of early-stopping.
翻译:离散状态空间扩散模型在涉及离散数据的应用中(如文本和图像生成)已展现出显著优势。已有研究观察到,其性能对速率矩阵的选择高度敏感,特别是在均匀速率矩阵与吸收速率矩阵之间。虽然实证结果表明吸收速率矩阵通常比均匀速率矩阵产生更好的生成质量,但现有理论工作主要集中于均匀速率矩阵的情况。值得注意的是,吸收扩散模型的收敛性保证与误差分析仍然缺失。在本工作中,我们首次为使用吸收速率矩阵的离散扩散模型提供了有限时间误差界与收敛速率分析。我们首先推导了前向过程KL散度的上界,通过引入代理初始化分布来解决吸收稳态分布(为单点分布,会导致KL散度无法定义)带来的挑战。随后,我们为吸收速率矩阵下的τ-跳跃采样器和均匀化采样器建立了首个收敛性保证,证明了其收敛速率优于使用均匀速率矩阵的对应方法。此外,在适当假设下,我们提供了无需早停的收敛性保证。我们的分析引入了若干新的技术工具以应对吸收速率矩阵特有的挑战,包括:用于界定前向过程收敛的Jensen型论证、界定吸收评分函数的新技术,以及在初始化附近对评分的非发散上界估计(从而消除了早停需求)。