This paper proposes a variational self-attention model (VSAM) that employs variational inference to derive self-attention. We model the self-attention vector as random variables by imposing a probabilistic distribution. The self-attention mechanism summarizes source information as an attention vector by weighted sum, where the weights are a learned probabilistic distribution. Compared with conventional deterministic counterpart, the stochastic units incorporated by VSAM allow multi-modal attention distributions. Furthermore, by marginalizing over the latent variables, VSAM is more robust against overfitting. Experiments on the stance detection task demonstrate the superiority of our method.
翻译:本文建议采用可变自留模式(VSAM),采用可变推论获得自留自留。我们通过强制设定概率分布,将自留矢量作为随机变量进行模拟。自留机制将源信息以加权数汇总为注意矢量,加权数是经学习的概率分布。与常规的确定性对应方相比,由VSAM组成的随机单位允许多模式关注分布。此外,通过对潜在变量进行边缘化,VSAM更能防止过度配置。关于定位探测任务的实验显示了我们方法的优越性。