When designing a neural caption generator, a convolutional neural network can be used to extract image features. Is it possible to also use a neural language model to extract sentence prefix features? We answer this question by trying different ways to transfer the recurrent neural network and embedding layer from a neural language model to an image caption generator. We find that image caption generators with transferred parameters perform better than those trained from scratch, even when simply pre-training them on the text of the same captions dataset it will later be trained on. We also find that the best language models (in terms of perplexity) do not result in the best caption generators after transfer learning.
翻译:在设计神经字幕生成器时,可以使用进化神经网络来提取图像特征。能否同时使用神经语言模型来提取句前置功能?我们试图以不同的方式将经常性神经网络和嵌入层从神经语言模型转移到图像字幕生成器,以此回答这个问题。我们发现,带有转移参数的图像字幕生成器比从零开始训练的生成器效果更好,即使只是就同一字幕数据集的文本对其进行了培训,以后也会对其进行培训。我们也发现,最佳语言模型(从易懂性角度来说)在转移学习后不会产生最佳字幕生成器。