How do humans recognize the action "opening a book" ? We argue that there are two important cues: modeling temporal shape dynamics and modeling functional relationships between humans and objects. In this paper, we propose to represent videos as space-time region graphs which capture these two important cues. Our graph nodes are defined by the object region proposals from different frames in a long range video. These nodes are connected by two types of relations: (i) similarity relations capturing the long range dependencies between correlated objects and (ii) spatial-temporal relations capturing the interactions between nearby objects. We perform reasoning on this graph representation via Graph Convolutional Networks. We achieve state-of-the-art results on both Charades and Something-Something datasets. Especially for Charades, we obtain a huge 4.4% gain when our model is applied in complex environments.

3
下载
关闭预览

相关内容

ACM/IEEE第23届模型驱动工程语言和系统国际会议,是模型驱动软件和系统工程的首要会议系列,由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来,模型涵盖了建模的各个方面,从语言和方法到工具和应用程序。模特的参加者来自不同的背景,包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛,参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会,并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。 官网链接:http://www.modelsconference.org/

Knowledge graph (KG) embedding encodes the entities and relations from a KG into low-dimensional vector spaces to support various applications such as KG completion, question answering, and recommender systems. In real world, knowledge graphs (KGs) are dynamic and evolve over time with addition or deletion of triples. However, most existing models focus on embedding static KGs while neglecting dynamics. To adapt to the changes in a KG, these models need to be re-trained on the whole KG with a high time cost. In this paper, to tackle the aforementioned problem, we propose a new context-aware Dynamic Knowledge Graph Embedding (DKGE) method which supports the embedding learning in an online fashion. DKGE introduces two different representations (i.e., knowledge embedding and contextual element embedding) for each entity and each relation, in the joint modeling of entities and relations as well as their contexts, by employing two attentive graph convolutional networks, a gate strategy, and translation operations. This effectively helps limit the impacts of a KG update in certain regions, not in the entire graph, so that DKGE can rapidly acquire the updated KG embedding by a proposed online learning algorithm. Furthermore, DKGE can also learn KG embedding from scratch. Experiments on the tasks of link prediction and question answering in a dynamic environment demonstrate the effectiveness and efficiency of DKGE.

0
8
下载
预览

Most state-of-the-art action localization systems process each action proposal individually, without explicitly exploiting their relations during learning. However, the relations between proposals actually play an important role in action localization, since a meaningful action always consists of multiple proposals in a video. In this paper, we propose to exploit the proposal-proposal relations using Graph Convolutional Networks (GCNs). First, we construct an action proposal graph, where each proposal is represented as a node and their relations between two proposals as an edge. Here, we use two types of relations, one for capturing the context information for each proposal and the other one for characterizing the correlations between distinct actions. Then we apply the GCNs over the graph to model the relations among different proposals and learn powerful representations for the action classification and localization. Experimental results show that our approach significantly outperforms the state-of-the-art on THUMOS14 (49.1% versus 42.8%). Moreover, augmentation experiments on ActivityNet also verify the efficacy of modeling action proposal relationships. Codes are available at https://github.com/Alvin-Zeng/PGCN.

0
5
下载
预览

Learning graph-structured data with graph neural networks (GNNs) has been recently emerging as an important field because of its wide applicability in bioinformatics, chemoinformatics, social network analysis and data mining. Recent GNN algorithms are based on neural message passing, which enables GNNs to integrate local structures and node features recursively. However, past GNN algorithms based on 1-hop neighborhood neural message passing are exposed to a risk of loss of information on local structures and relationships. In this paper, we propose Neighborhood Edge AggregatoR (NEAR), a novel framework that aggregates relations between the nodes in the neighborhood via edges. NEAR, which can be orthogonally combined with previous GNN algorithms, gives integrated information that describes which nodes in the neighborhood are connected. Therefore, GNNs combined with NEAR reflect each node's local structure beyond the nodes themselves. Experimental results on multiple graph classification tasks show that our algorithm achieves state-of-the-art results.

0
5
下载
预览

Paragraph-style image captions describe diverse aspects of an image as opposed to the more common single-sentence captions that only provide an abstract description of the image. These paragraph captions can hence contain substantial information of the image for tasks such as visual question answering. Moreover, this textual information is complementary with visual information present in the image because it can discuss both more abstract concepts and more explicit, intermediate symbolic information about objects, events, and scenes that can directly be matched with the textual question and copied into the textual answer (i.e., via easier modality match). Hence, we propose a combined Visual and Textual Question Answering (VTQA) model which takes as input a paragraph caption as well as the corresponding image, and answers the given question based on both inputs. In our model, the inputs are fused to extract related information by cross-attention (early fusion), then fused again in the form of consensus (late fusion), and finally expected answers are given an extra score to enhance the chance of selection (later fusion). Empirical results show that paragraph captions, even when automatically generated (via an RL-based encoder-decoder model), help correctly answer more visual questions. Overall, our joint model, when trained on the Visual Genome dataset, significantly improves the VQA performance over a strong baseline model.

0
6
下载
预览

In order to answer semantically-complicated questions about an image, a Visual Question Answering (VQA) model needs to fully understand the visual scene in the image, especially the interactive dynamics between different objects. We propose a Relation-aware Graph Attention Network (ReGAT), which encodes each image into a graph and models multi-type inter-object relations via a graph attention mechanism, to learn question-adaptive relation representations. Two types of visual object relations are explored: (i) Explicit Relations that represent geometric positions and semantic interactions between objects; and (ii) Implicit Relations that capture the hidden dynamics between image regions. Experiments demonstrate that ReGAT outperforms prior state-of-the-art approaches on both VQA 2.0 and VQA-CP v2 datasets. We further show that ReGAT is compatible to existing VQA architectures, and can be used as a generic relation encoder to boost the model performance for VQA.

0
4
下载
预览

We explore the use of a knowledge graphs, that capture general or commonsense knowledge, to augment the information extracted from images by the state-of-the-art methods for image captioning. The results of our experiments, on several benchmark data sets such as MS COCO, as measured by CIDEr-D, a performance metric for image captioning, show that the variants of the state-of-the-art methods for image captioning that make use of the information extracted from knowledge graphs can substantially outperform those that rely solely on the information extracted from images.

0
5
下载
预览

Graphs, which describe pairwise relations between objects, are essential representations of many real-world data such as social networks. In recent years, graph neural networks, which extend the neural network models to graph data, have attracted increasing attention. Graph neural networks have been applied to advance many different graph related tasks such as reasoning dynamics of the physical system, graph classification, and node classification. Most of the existing graph neural network models have been designed for static graphs, while many real-world graphs are inherently dynamic. For example, social networks are naturally evolving as new users joining and new relations being created. Current graph neural network models cannot utilize the dynamic information in dynamic graphs. However, the dynamic information has been proven to enhance the performance of many graph analytical tasks such as community detection and link prediction. Hence, it is necessary to design dedicated graph neural networks for dynamic graphs. In this paper, we propose DGNN, a new {\bf D}ynamic {\bf G}raph {\bf N}eural {\bf N}etwork model, which can model the dynamic information as the graph evolving. In particular, the proposed framework can keep updating node information by capturing the sequential information of edges, the time intervals between edges and information propagation coherently. Experimental results on various dynamic graphs demonstrate the effectiveness of the proposed framework.

0
13
下载
预览

Graph-based semi-supervised learning (SSL) is an important learning problem where the goal is to assign labels to initially unlabeled nodes in a graph. Graph Convolutional Networks (GCNs) have recently been shown to be effective for graph-based SSL problems. GCNs inherently assume existence of pairwise relationships in the graph-structured data. However, in many real-world problems, relationships go beyond pairwise connections and hence are more complex. Hypergraphs provide a natural modeling tool to capture such complex relationships. In this work, we explore the use of GCNs for hypergraph-based SSL. In particular, we propose HyperGCN, an SSL method which uses a layer-wise propagation rule for convolutional neural networks operating directly on hypergraphs. To the best of our knowledge, this is the first principled adaptation of GCNs to hypergraphs. HyperGCN is able to encode both the hypergraph structure and hypernode features in an effective manner. Through detailed experimentation, we demonstrate HyperGCN's effectiveness at hypergraph-based SSL.

0
9
下载
预览

Most research in reading comprehension has focused on answering questions based on individual documents or even single paragraphs. We introduce a method which integrates and reasons relying on information spread within documents and across multiple documents. We frame it as an inference problem on a graph. Mentions of entities are nodes of this graph where edges encode relations between different mentions (e.g., within- and cross-document co-references). Graph convolutional networks (GCNs) are applied to these graphs and trained to perform multi-step reasoning. Our Entity-GCN method is scalable and compact, and it achieves state-of-the-art results on the WikiHop dataset (Welbl et al. 2017).

0
3
下载
预览

Style transfer usually refers to the task of applying color and texture information from a specific style image to a given content image while preserving the structure of the latter. Here we tackle the more generic problem of semantic style transfer: given two unpaired collections of images, we aim to learn a mapping between the corpus-level style of each collection, while preserving semantic content shared across the two domains. We introduce XGAN ("Cross-GAN"), a dual adversarial autoencoder, which captures a shared representation of the common domain semantic content in an unsupervised way, while jointly learning the domain-to-domain image translations in both directions. We exploit ideas from the domain adaptation literature and define a semantic consistency loss which encourages the model to preserve semantics in the learned embedding space. We report promising qualitative results for the task of face-to-cartoon translation. The cartoon dataset we collected for this purpose is in the process of being released as a new benchmark for semantic style transfer.

0
3
下载
预览
小贴士
相关论文
Efficiently Embedding Dynamic Knowledge Graphs
Tianxing Wu,Arijit Khan,Huan Gao,Cheng Li
8+阅读 · 2019年10月15日
Runhao Zeng,Wenbing Huang,Mingkui Tan,Yu Rong,Peilin Zhao,Junzhou Huang,Chuang Gan
5+阅读 · 2019年9月7日
NEAR: Neighborhood Edge AggregatoR for Graph Classification
Cheolhyeong Kim,Haeseong Moon,Hyung Ju Hwang
5+阅读 · 2019年9月6日
Improving Visual Question Answering by Referring to Generated Paragraph Captions
Hyounghun Kim,Mohit Bansal
6+阅读 · 2019年6月14日
Linjie Li,Zhe Gan,Yu Cheng,Jingjing Liu
4+阅读 · 2019年3月29日
Yimin Zhou,Yiwei Sun,Vasant Honavar
5+阅读 · 2019年1月25日
Yao Ma,Ziyi Guo,Zhaochun Ren,Eric Zhao,Jiliang Tang,Dawei Yin
13+阅读 · 2018年10月24日
HyperGCN: Hypergraph Convolutional Networks for Semi-Supervised Classification
Naganand Yadati,Madhav Nimishakavi,Prateek Yadav,Anand Louis,Partha Talukdar
9+阅读 · 2018年9月7日
Question Answering by Reasoning Across Documents with Graph Convolutional Networks
Nicola De Cao,Wilker Aziz,Ivan Titov
3+阅读 · 2018年8月29日
Amélie Royer,Konstantinos Bousmalis,Stephan Gouws,Fred Bertsch,Inbar Mosseri,Forrester Cole,Kevin Murphy
3+阅读 · 2018年4月25日
相关VIP内容
因果图,Causal Graphs,52页ppt
专知会员服务
107+阅读 · 2020年4月19日
专知会员服务
81+阅读 · 2020年3月12日
专知会员服务
38+阅读 · 2020年2月26日
《DeepGCNs: Making GCNs Go as Deep as CNNs》
专知会员服务
20+阅读 · 2019年10月17日
[综述]深度学习下的场景文本检测与识别
专知会员服务
29+阅读 · 2019年10月10日
计算机视觉最佳实践、代码示例和相关文档
专知会员服务
7+阅读 · 2019年10月9日
相关资讯
计算机 | 入门级EI会议ICVRIS 2019诚邀稿件
Call4Papers
7+阅读 · 2019年6月24日
无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019
PaperWeekly
5+阅读 · 2019年5月5日
Call for Participation: Shared Tasks in NLPCC 2019
中国计算机学会
5+阅读 · 2019年3月22日
A Technical Overview of AI & ML in 2018 & Trends for 2019
待字闺中
10+阅读 · 2018年12月24日
人工智能 | 国际会议信息10条
Call4Papers
5+阅读 · 2018年12月18日
disentangled-representation-papers
CreateAMind
20+阅读 · 2018年9月12日
计算机视觉近一年进展综述
机器学习研究会
6+阅读 · 2017年11月25日
Capsule Networks解析
机器学习研究会
10+阅读 · 2017年11月12日
MoCoGAN 分解运动和内容的视频生成
CreateAMind
11+阅读 · 2017年10月21日
【今日新增】IEEE Trans.专刊截稿信息8条
Call4Papers
4+阅读 · 2017年6月29日
Top