Density ratio estimation serves as an important technique in the unsupervised machine learning toolbox. However, such ratios are difficult to estimate for complex, high-dimensional data, particularly when the densities of interest are sufficiently different. In our work, we propose to leverage an invertible generative model to map the two distributions into a common feature space prior to estimation. This featurization brings the densities closer together in latent space, sidestepping pathological scenarios where the learned density ratios in input space can be arbitrarily inaccurate. At the same time, the invertibility of our feature map guarantees that the ratios computed in feature space are equivalent to those in input space. Empirically, we demonstrate the efficacy of our approach in a variety of downstream tasks that require access to accurate density ratios such as mutual information estimation, targeted sampling in deep generative models, and classification with data augmentation.
翻译:密度比率估算是无人监督的机器学习工具箱中的一项重要技术。然而,这种比率很难估计复杂、高维的数据,特别是当感兴趣的密度差异很大时。在我们的工作中,我们提议利用一个不可忽略的基因模型,将两种分布图绘制成一个共同的特征空间,然后进行估计。这种发芽使潜空间的密度更加接近,绕过输入空间所学的密度比率可能任意不准确的病理假设。与此同时,我们的地貌图的可视性保证了在地貌空间计算的比率与输入空间的相同。我们很生动地展示了我们在一系列下游任务中的方法的有效性,这些任务需要获得准确的密度比率,例如相互的信息估计、深层基因模型中有针对性的取样以及数据增强分类。