Neural attention (NA) has become a key component of sequence-to-sequence models that yield state-of-the-art performance in as hard tasks as abstractive document summarization (ADS) and video captioning (VC). NA mechanisms perform inference of context vectors; these constitute weighted sums of deterministic input sequence encodings, adaptively sourced over long temporal horizons. Inspired from recent work in the field of amortized variational inference (AVI), in this work we consider treating the context vectors generated by soft-attention (SA) models as latent variables, with approximate finite mixture model posteriors inferred via AVI. We posit that this formulation may yield stronger generalization capacity, in line with the outcomes of existing applications of AVI to deep networks. To illustrate our method, we implement it and experimentally evaluate it considering challenging ADS, VC, and MT benchmarks. This way, we exhibit its improved effectiveness over state-of-the-art alternatives.
翻译:神经关注(NA)已成为序列到顺序模型的一个关键组成部分,这些模型在抽象文件总结和视频字幕等艰巨任务中产生最先进的性能。 NA机制对上下文矢量进行推断;这些是确定性输入序列编码的加权总和,可适应长时间跨度来源。根据最近在摊销变异推断(AVI)领域开展的工作,我们考虑将软注意(SA)模型产生的环境矢量作为潜在变量处理,通过AVI推断出近似有限混合物模型外延体。我们假设这种配方可产生更强的概括性能力,符合AVI对深层网络现有应用的结果。为了说明我们的方法,我们实施并实验性地评价它,考虑具有挑战性的ADS、VC和MT基准。这样,我们展示了它相对于最新替代品的更大效力。