Transformer是谷歌发表的论文《Attention Is All You Need》提出一种完全基于Attention的翻译架构

VIP内容

题目: Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models

摘要: 最近基于Transformer的大规模预训练模型已经彻底改变了视觉和语言(V+L)研究。ViLBERT、LXMERT和UNITER等模型通过联合图像-文本预训练在大量的V+L基准上显著提高了技术水平。然而,人们对这些令人印象深刻的成功背后的内在机制知之甚少。为了揭示这些强大的模型的场景背后的秘密,我们提出的Value(视觉和语言理解评估),是一个精心设计的探索任务(如视觉算法,视觉检测的关系,语言探索任务)可概括的标准预训练V+L模型,旨在解读多通道的内部运作训练的(例如,个人的隐性知识获得关注,通过上下文化的多模态嵌入学习的固有的跨模态对齐)。通过这些探测任务对每个原型模型体系结构进行大量的分析,我们的主要观察结果如下:(i)预训练的模型显示出在推理过程中专注于文本而非图像的倾向。(ii)存在一种注意力头子集,专门用于捕捉跨模态交互。(iii)在预训练的模型中学习注意力矩阵,显示与图像区域和文本单词之间的隐对齐一致的模式。(iv)绘制的注意力模式揭示了图像区域之间的视觉解释关系。纯粹的语言知识也有效地编码在注意力头中。这些都是有价值的见解,有助于指导未来的工作,以设计更好的模型架构和目标的多模态预训练。

成为VIP会员查看完整内容
0
8

最新论文

We provide a graphical treatment of SAT and \#SAT on equal footing. Instances of \#SAT can be represented as tensor networks in a standard way. These tensor networks are interpreted by diagrams of the ZH-calculus: a system to reason about tensors over $\mathbb{C}$ in terms of diagrams built from simple generators, in which computation may be carried out by \emph{transformations of diagrams alone}. In general, nodes of ZH diagrams take parameters over $\mathbb{C}$ which determine the tensor coefficients; for the standard representation of \#SAT instances, the coefficients take the value $0$ or $1$. Then, by choosing the coefficients of a diagram to range over $\mathbb B$, we represent the corresponding instance of SAT. Thus, by interpreting a diagram either over the boolean semiring or the complex numbers, we instantiate either the \emph{decision} or \emph{counting} version of the problem. We find that for classes known to be in P, such as $2$SAT and \#XORSAT, the existence of appropriate rewrite rules allows for efficient simplification of the diagram, producing the solution in polynomial time. In contrast, for classes known to be NP-complete, such as $3$SAT, or \#P-complete, such as \#$2$SAT, the corresponding rewrite rules introduce hyperedges to the diagrams, in numbers which are not easily bounded above by a polynomial. This diagrammatic approach unifies the diagnosis of the complexity of CSPs and \#CSPs and shows promise in aiding tensor network contraction-based algorithms.

0
0
下载
预览
Top