Earlier approaches indirectly studied the information captured by the hidden states of recurrent and non-recurrent neural machine translation models by feeding them into different classifiers. In this paper, we look at the encoder hidden states of both transformer and recurrent machine translation models from the nearest neighbors perspective. We investigate to what extent the nearest neighbors share information with the underlying word embeddings as well as related WordNet entries. Additionally, we study the underlying syntactic structure of the nearest neighbors to shed light on the role of syntactic similarities in bringing the neighbors together. We compare transformer and recurrent models in a more intrinsic way in terms of capturing lexical semantics and syntactic structures, in contrast to extrinsic approaches used by previous works. In agreement with the extrinsic evaluations in the earlier works, our experimental results show that transformers are superior in capturing lexical semantics, but not necessarily better in capturing the underlying syntax. Additionally, we show that the backward recurrent layer in a recurrent model learns more about the semantics of words, whereas the forward recurrent layer encodes more context.
翻译:早期的方法间接研究了隐蔽状态的经常性和非经常性神经机器翻译模型的信息,将其输入不同的分类器。 在本文中,我们从近邻的角度审视变压器和经常性机器翻译模型的隐藏状态。 我们调查最近的邻居在多大程度上与嵌入的基本词和相关WordNet条目共享信息。 此外,我们研究近邻的基本合成结构,以揭示合成相似性在将邻居聚集在一起方面的作用。 我们比较变压器和经常性模型,在捕捉词汇语义和合成结构方面,与以往作品使用的绝缘方法相比,更内在地比较变压器和经常性模型。 我们的实验结果表明,与早先作品的外部评估一致,变压器在捕捉词汇语义语义学和相关的条目方面优异,但在捕捉基本语义学方面不一定更好。 此外,我们显示,一个经常性模型的后向经常性层对文字的语义学有较多的了解,而前向的常态层则更深入地编码。