We propose a function-valued evaluation metric for generative models based on the relative density ratio (RDR) designed to characterize distributional differences between real and generated samples. As an evaluation metric, the RDR function preserves $φ$-divergence between two distributions, enables sample-level evaluation that facilitates downstream investigations of feature-specific distributional differences, and has a bounded range that affords clear interpretability and numerical stability. Function estimation of the RDR is achieved efficiently through optimization on the variational form of $φ$-divergence. We provide theoretical convergence rate guarantees for general estimators based on M-estimator theory, as well as the convergence rate of neural network-based estimators when the true ratio is in the anisotropic Besov space. We demonstrate the power of the proposed RDR-based evaluation through numerical experiments on MNIST, CelebA64, and the American Gut project microbiome data. We show that the estimated RDR enables not only effective overall comparison of competing generative models, but also a convenient way to reveal the underlying nature of goodness-of-fit. This enables one to assess support overlap, coverage, and fidelity while pinpointing regions of the sample space where generators concentrate and revealing the features that drive the most salient distributional differences.
翻译:我们提出了一种基于相对密度比的函数值评估指标,旨在刻画真实样本与生成样本之间的分布差异。作为评估指标,相对密度比函数能够保持两个分布之间的φ散度,实现样本级评估以促进对特定特征分布差异的下游研究,并具有有界范围,从而提供清晰的解释性和数值稳定性。相对密度比函数的估计通过优化φ散度的变分形式高效实现。我们基于M估计量理论为一般估计量提供了理论收敛速率保证,并给出了当真实比率处于各向异性Besov空间时基于神经网络估计量的收敛速率。我们通过在MNIST、CelebA64和美国肠道计划微生物组数据上的数值实验,展示了所提出的基于相对密度比的评估方法的优势。研究表明,估计的相对密度比不仅能够有效比较不同生成模型的整体性能,还提供了一种便捷方式来揭示拟合优度的本质特性。这使得我们能够评估支持集重叠度、覆盖度和保真度,同时精确定位生成器集中采样的样本空间区域,并揭示导致最显著分布差异的特征驱动因素。