视频背景对话的 " 注意视频背景对话 " (Structured Co-reference Graph Attention for Video-grounded Dialogue) - 专知论文

会员服务 ·

0

任务对话系统 · Performer · 图注意力网络 · 图 · 注意力机制 ·

2021 年 3 月 24 日

Structured Co-reference Graph Attention for Video-grounded Dialogue

翻译：视频背景对话的 " 注意视频背景对话 "

Junyeong Kim,Sunjae Yoon,Dahyun Kim,Chang D. Yoo

from arxiv, Accepted to AAAI2021

A video-grounded dialogue system referred to as the Structured Co-reference Graph Attention (SCGA) is presented for decoding the answer sequence to a question regarding a given video while keeping track of the dialogue context. Although recent efforts have made great strides in improving the quality of the response, performance is still far from satisfactory. The two main challenging issues are as follows: (1) how to deduce co-reference among multiple modalities and (2) how to reason on the rich underlying semantic structure of video with complex spatial and temporal dynamics. To this end, SCGA is based on (1) Structured Co-reference Resolver that performs dereferencing via building a structured graph over multiple modalities, (2) Spatio-temporal Video Reasoner that captures local-to-global dynamics of video via gradually neighboring graph attention. SCGA makes use of pointer network to dynamically replicate parts of the question for decoding the answer sequence. The validity of the proposed SCGA is demonstrated on AVSD@DSTC7 and AVSD@DSTC8 datasets, a challenging video-grounded dialogue benchmarks, and TVQA dataset, a large-scale videoQA benchmark. Our empirical results show that SCGA outperforms other state-of-the-art dialogue systems on both benchmarks, while extensive ablation study and qualitative analysis reveal performance gain and improved interpretability.

翻译：以视频为基础的对话系统,称为结构化共同参考图表(SCGA),用于解码对特定视频问题的答案序列,同时跟踪对话背景。虽然最近的努力在提高答复质量方面取得了长足进步,但业绩仍然远远不令人满意。两个主要挑战性问题如下:(1) 如何推算多种模式之间的共同参照,和(2) 如何解释具有复杂空间和时间动态的视频的丰富基本语义结构。为此,SCGA基于:(1) 结构化共同解码器,通过建立多模式结构化图表进行参考;(2) Spatio-时间性视频解释器,通过逐步临近图示关注捕捉到视频在本地到全球的动态;SCGA利用指示器网络动态复制问题的某些部分以解析答案序列。拟议的SCGA的有效性在AVSD@DSTC7和AVSDSD@DTC8数据集上展示,具有挑战性的视频背景对话基准,TVQA数据设置,通过渐进式的图像显示大规模对话结果,同时同时对SCTAAA进行广泛的业绩分析。我们的经验性结果展示了SBA和定性分析。

0

相关内容

任务对话系统

任务对话系统

AAAI2021 | 图神经网络的异质图结构学习，Heterogeneous Graph Structure Learning for Graph Neural Networks

专知会员服务

92+阅读 · 2021年1月20日

【EMNLP2020】自然语言生成，Neural Language Generation

【EMNLP2020】自然语言生成，Neural Language Generation

专知会员服务

39+阅读 · 2020年11月20日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

【芝加哥大学】GRAPH-BERT: Only Attention is Needed for Learning Graph Representations

【芝加哥大学】GRAPH-BERT: Only Attention is Needed for Learning Graph Representations

专知会员服务

85+阅读 · 2020年1月15日

【论文】双曲图卷积神经网络（Hyperbolic Graph Convolutional Neural Networks），斯坦福大学| Ines Chami，斯坦福大学| Rex Ying

【论文】双曲图卷积神经网络（Hyperbolic Graph Convolutional Neural Networks），斯坦福大学| Ines Chami，斯坦福大学| Rex Ying

专知会员服务

116+阅读 · 2019年12月30日

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

专知会员服务

21+阅读 · 2019年12月2日

【ACM MM 2019 】MMGCN：用于微视频个性化推荐的多模图卷积网络（MMGCN：Multi-modal Graph Convolution Network for Personalized Recommendation of Micro-video）

【ACM MM 2019 】MMGCN：用于微视频个性化推荐的多模图卷积网络（MMGCN：Multi-modal Graph Convolution Network for Personalized Recommendation of Micro-video）

专知会员服务

57+阅读 · 2019年11月20日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

内涵网络嵌入：Content-rich Network Embedding

内涵网络嵌入：Content-rich Network Embedding

我爱读PAMI

4+阅读 · 2019年11月5日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

专知

31+阅读 · 2018年6月4日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

carla 学习笔记

carla 学习笔记

CreateAMind

9+阅读 · 2018年2月7日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

VX2TEXT: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs

Arxiv

3+阅读 · 2021年1月29日

KBGN: Knowledge-Bridge Graph Network for Adaptive Vision-Text Reasoning in Visual Dialogue

KBGN: Knowledge-Bridge Graph Network for Adaptive Vision-Text Reasoning in Visual Dialogue

Arxiv

12+阅读 · 2020年8月11日

Learning to Customize Model Structures for Few-shot Dialogue Generation Tasks

Arxiv

3+阅读 · 2020年5月13日

Video2Commonsense: Generating Commonsense Descriptions to Enrich Video Captioning

Video2Commonsense: Generating Commonsense Descriptions to Enrich Video Captioning

Arxiv

3+阅读 · 2020年3月17日

Improving Knowledge-aware Dialogue Generation via Knowledge Base Question Answering

Arxiv

16+阅读 · 2019年12月16日

Structured Query Construction via Knowledge Graph Embedding

Structured Query Construction via Knowledge Graph Embedding

Arxiv

6+阅读 · 2019年9月6日

Dialogue Natural Language Inference

Arxiv

7+阅读 · 2018年11月1日

Improving Natural Language Inference Using External Knowledge in the Science Questions Domain

Improving Natural Language Inference Using External Knowledge in the Science Questions Domain

Arxiv

3+阅读 · 2018年9月15日

Exploring Graph-structured Passage Representation for Multi-hop Reading Comprehension with Graph Neural Networks

Arxiv

6+阅读 · 2018年9月6日

End-to-End Video Classification with Knowledge Graphs

Arxiv

4+阅读 · 2017年11月6日

VIP会员

文章信息

相关主题

任务对话系统

图注意力网络

注意力机制

相关VIP内容

AAAI2021 | 图神经网络的异质图结构学习，Heterogeneous Graph Structure Learning for Graph Neural Networks

专知会员服务

92+阅读 · 2021年1月20日

【EMNLP2020】自然语言生成，Neural Language Generation

【EMNLP2020】自然语言生成，Neural Language Generation

专知会员服务

39+阅读 · 2020年11月20日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

【芝加哥大学】GRAPH-BERT: Only Attention is Needed for Learning Graph Representations

【芝加哥大学】GRAPH-BERT: Only Attention is Needed for Learning Graph Representations

专知会员服务

85+阅读 · 2020年1月15日

【论文】双曲图卷积神经网络（Hyperbolic Graph Convolutional Neural Networks），斯坦福大学| Ines Chami，斯坦福大学| Rex Ying

【论文】双曲图卷积神经网络（Hyperbolic Graph Convolutional Neural Networks），斯坦福大学| Ines Chami，斯坦福大学| Rex Ying

专知会员服务

116+阅读 · 2019年12月30日

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

专知会员服务

21+阅读 · 2019年12月2日

【ACM MM 2019 】MMGCN：用于微视频个性化推荐的多模图卷积网络（MMGCN：Multi-modal Graph Convolution Network for Personalized Recommendation of Micro-video）

【ACM MM 2019 】MMGCN：用于微视频个性化推荐的多模图卷积网络（MMGCN：Multi-modal Graph Convolution Network for Personalized Recommendation of Micro-video）

专知会员服务

57+阅读 · 2019年11月20日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】扩展可扩展会话推荐的边界

别想太多：高效 R1 风格大型推理模型综述

【ACMMM2025】EvoVLMA: 进化式视觉-语言模型自适应

智能体网络：用AI智能体编织下一代网络

相关资讯

内涵网络嵌入：Content-rich Network Embedding

内涵网络嵌入：Content-rich Network Embedding

我爱读PAMI

4+阅读 · 2019年11月5日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

专知

31+阅读 · 2018年6月4日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

carla 学习笔记

carla 学习笔记

CreateAMind

9+阅读 · 2018年2月7日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

VX2TEXT: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs

Arxiv

3+阅读 · 2021年1月29日

KBGN: Knowledge-Bridge Graph Network for Adaptive Vision-Text Reasoning in Visual Dialogue

KBGN: Knowledge-Bridge Graph Network for Adaptive Vision-Text Reasoning in Visual Dialogue

Arxiv

12+阅读 · 2020年8月11日

Learning to Customize Model Structures for Few-shot Dialogue Generation Tasks

Arxiv

3+阅读 · 2020年5月13日

Video2Commonsense: Generating Commonsense Descriptions to Enrich Video Captioning

Video2Commonsense: Generating Commonsense Descriptions to Enrich Video Captioning

Arxiv

3+阅读 · 2020年3月17日

Improving Knowledge-aware Dialogue Generation via Knowledge Base Question Answering

Arxiv

16+阅读 · 2019年12月16日

Structured Query Construction via Knowledge Graph Embedding

Structured Query Construction via Knowledge Graph Embedding

Arxiv

6+阅读 · 2019年9月6日

Dialogue Natural Language Inference

Arxiv

7+阅读 · 2018年11月1日

Improving Natural Language Inference Using External Knowledge in the Science Questions Domain

Improving Natural Language Inference Using External Knowledge in the Science Questions Domain

Arxiv

3+阅读 · 2018年9月15日

Exploring Graph-structured Passage Representation for Multi-hop Reading Comprehension with Graph Neural Networks

Arxiv

6+阅读 · 2018年9月6日

End-to-End Video Classification with Knowledge Graphs

Arxiv

4+阅读 · 2017年11月6日

微信扫码咨询专知VIP会员