The integration of discrete algorithmic components in deep learning architectures has numerous applications. Recently, Implicit Maximum Likelihood Estimation (IMLE, Niepert, Minervini, and Franceschi 2021), a class of gradient estimators for discrete exponential family distributions, was proposed by combining implicit differentiation through perturbation with the path-wise gradient estimator. However, due to the finite difference approximation of the gradients, it is especially sensitive to the choice of the finite difference step size which needs to be specified by the user. In this work, we present Adaptive IMLE (AIMLE) the first adaptive gradient estimator for complex discrete distributions: it adaptively identifies the target distribution for IMLE by trading off the density of gradient information with the degree of bias in the gradient estimates. We empirically evaluate our estimator on synthetic examples, as well as on Learning to Explain, Discrete Variational Auto-Encoders, and Neural Relational Inference tasks. In our experiments, we show that our adaptive gradient estimator can produce faithful estimates while requiring orders of magnitude fewer samples than other gradient estimators.
翻译:将离散的算法组成部分整合到深层学习结构中有许多应用。 最近,通过将离散指数型家庭分布的梯度估计器(IMLE、Niepert、Minervini和Franceschi 2021)这一类梯度估计器,提出了将隐含的差别与路径偏差梯度估计器相结合的建议。然而,由于梯度的有限差差近值,它对于选择用户需要指定的有限差分级大小特别敏感。在这项工作中,我们提出了用于复杂离散分布的第一个适应性梯度估计器(IMLE):它通过将梯度信息的密度与梯度估计偏差程度进行交换,以适应性地确定IMLE的目标分布。我们实证地评估了我们的合成范例估计器,以及学习解释、差异性自动电算器和神经关系对比任务。 在我们的实验中,我们显示,我们的适应性梯度估计器比其他梯度比例的定数要低,同时要求精确的梯度测量器能产生精确的定。