We study the implicit bias of flow matching (FM) samplers via the lens of empirical flow matching. Although population FM may produce gradient-field velocities resembling optimal transport (OT), we show that the empirical FM minimizer is almost never a gradient field, even when each conditional flow is. Consequently, empirical FM is intrinsically energetically suboptimal. In view of this, we analyze the kinetic energy of generated samples. With Gaussian sources, both instantaneous and integrated kinetic energies exhibit exponential concentration, while heavy-tailed sources lead to polynomial tails. These behaviors are governed primarily by the choice of source distribution rather than the data. Overall, these notes provide a concise mathematical account of the structural and energetic biases arising in empirical FM.
翻译:我们通过经验流匹配的视角研究流匹配(FM)采样器的隐式偏差。尽管总体FM可能产生类似于最优传输(OT)的梯度场速度,但我们证明经验FM的最小化解几乎从不构成梯度场,即使每个条件流本身是梯度场。因此,经验FM本质上是能量次优的。基于此,我们分析生成样本的动能。对于高斯源分布,瞬时动能与积分动能均呈现指数集中性;而对于重尾源分布,则导致多项式尾部行为。这些现象主要受源分布的选择支配,而非数据本身。总体而言,本文提供了关于经验FM中出现的结构性与能量性偏差的简明数学阐述。