BERT-hLLSTMs: 用于视觉叙事的BERT和等级LSTMs (BERT-hLSTMs: BERT and Hierarchical LSTMs for Visual Storytelling)

Visual storytelling is a creative and challenging task, aiming to automatically generate a story-like description for a sequence of images. The descriptions generated by previous visual storytelling approaches lack coherence because they use word-level sequence generation methods and do not adequately consider sentence-level dependencies. To tackle this problem, we propose a novel hierarchical visual storytelling framework which separately models sentence-level and word-level semantics. We use the transformer-based BERT to obtain embeddings for sentences and words. We then employ a hierarchical LSTM network: the bottom LSTM receives as input the sentence vector representation from BERT, to learn the dependencies between the sentences corresponding to images, and the top LSTM is responsible for generating the corresponding word vector representations, taking input from the bottom LSTM. Experimental results demonstrate that our model outperforms most closely related baselines under automatic evaluation metrics BLEU and CIDEr, and also show the effectiveness of our method with human evaluation.

翻译：视觉故事说明是一项创造性和富有挑战性的任务,目的是为一系列图像自动生成类似故事的描述。以往视觉故事说明方法生成的描述缺乏一致性,因为它们使用字级序列生成方法,没有充分考虑到判决的依附性。为了解决这一问题,我们提议了一个新型的等级直观故事说明框架,分别以句级和字级语义为模型。我们使用基于变压器的BERT网络为句子和文字嵌入嵌入内容。然后我们使用一个等级LSTM网络:最底层LSTM接收来自BERT的句子矢量表示作为输入,以了解与图像对应的句子之间的依存性,而顶层LSTM负责生成相应的矢量表达方式,从底LSTM中提取投入。实验结果表明,我们的模型在自动评价指标BLEU和CIDer下,超越了最密切相关的基线。我们还展示了我们方法在人类评估方面的有效性。

相关内容

长短期记忆网络

关注 120

长短期记忆网络(LSTM)是一种用于深度学习领域的人工回归神经网络(RNN)结构。与标准的前馈神经网络不同，LSTM具有反馈连接。它不仅可以处理单个数据点(如图像)，还可以处理整个数据序列(如语音或视频)。例如，LSTM适用于未分段、连接的手写识别、语音识别、网络流量或IDSs(入侵检测系统)中的异常检测等任务。

最新《Transformers模型》教程，64页ppt

专知会员服务

284+阅读 · 2020年11月26日

最新《注意力机制》教程，112页ppt

专知会员服务

308+阅读 · 2020年11月24日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

65+阅读 · 2020年7月25日

【CVPR2020】视觉导航的神经拓扑SLAM，Neural Topological SLAM for Visual Navigation

专知会员服务

49+阅读 · 2020年5月26日