Voxel 知情语言定位 (Voxel-informed Language Grounding) - 专知论文

会员服务 ·

0

INFORMS · 讲稿 · MoDELS · 模型评估 · 3D ·

2022 年 5 月 19 日

Voxel-informed Language Grounding

翻译：Voxel 知情语言定位

Rodolfo Corona,Shizhan Zhu,Dan Klein,Trevor Darrell

from arxiv, ACL 2022

Natural language applied to natural 2D images describes a fundamentally 3D world. We present the Voxel-informed Language Grounder (VLG), a language grounding model that leverages 3D geometric information in the form of voxel maps derived from the visual input using a volumetric reconstruction model. We show that VLG significantly improves grounding accuracy on SNARE, an object reference game task. At the time of writing, VLG holds the top place on the SNARE leaderboard, achieving SOTA results with a 2.0% absolute improvement.

翻译：天然 2D 图像的自然语言描述一个根本的 3D 世界。我们展示了Voxel 知情语言地表仪(VLG), 这是一种语言定位模型,它利用了3D 几何信息, 其形式是用量体重建模型从视觉输入中提取的 voxel 地图。我们显示, VLG 显著提高了SNARE 的地基精确度, SNARE 是一个目标参考游戏任务。在编写本报告时, VLG 在 SNARE 首列上居首位, 实现了 SOTA 成果, 实现了2.0%的绝对改善。

0

相关内容

INFORMS

《计算机信息》杂志发表高质量的论文，扩大了运筹学和计算的范围，寻求有关理论、方法、实验、系统和应用方面的原创研究论文、新颖的调查和教程论文，以及描述新的和有用的软件工具的论文。官网链接：https://pubsonline.informs.org/journal/ijoc

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新6篇图像描述生成相关论文—语言为枢纽、细粒度、生成器、注意力机制、策略梯度优化、判别性目标

【论文推荐】最新6篇图像描述生成相关论文—语言为枢纽、细粒度、生成器、注意力机制、策略梯度优化、判别性目标

专知

11+阅读 · 2018年3月20日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

基于单语语料的无监督统计机器翻译模型研究

国家自然科学基金

1+阅读 · 2013年12月31日

基于知识迁移的跨领域人体动作识别

国家自然科学基金

5+阅读 · 2013年12月31日

三维椭圆方程Cauchy问题的正则化方法

国家自然科学基金

0+阅读 · 2013年12月31日

基于原子相干的非线性级联效应控制光孤子传输

国家自然科学基金

0+阅读 · 2013年12月31日

利用多面体模块构建新型荧光材料基质及其对发光性能的调控研究

国家自然科学基金

0+阅读 · 2012年12月31日

Explore and Match: A New Paradigm for Temporal Video Grounding with Natural Language

Arxiv

0+阅读 · 2022年7月7日

Efficient Self-supervised Vision Transformers for Representation Learning

Arxiv

0+阅读 · 2022年7月6日

Weakly Supervised Grounding for VQA in Vision-Language Transformers

Arxiv

0+阅读 · 2022年7月5日

Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning

Arxiv

13+阅读 · 2021年4月7日

Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation

Arxiv

13+阅读 · 2020年7月3日

VIP会员

文章信息

相关主题

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

前沿人工智能趋势报告（Frontier AI Trends Report）

【AAAI2026】善始则事半功倍：基于前缀优化的大语言模型推理强化学习

Andrej Karpathy：2025 年 LLM 年度回顾（2025 LLM Year in Review）

音退化问题：基于输入操控的鲁棒语音转换综述

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新6篇图像描述生成相关论文—语言为枢纽、细粒度、生成器、注意力机制、策略梯度优化、判别性目标

【论文推荐】最新6篇图像描述生成相关论文—语言为枢纽、细粒度、生成器、注意力机制、策略梯度优化、判别性目标

专知

11+阅读 · 2018年3月20日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

相关论文

Explore and Match: A New Paradigm for Temporal Video Grounding with Natural Language

Arxiv

0+阅读 · 2022年7月7日

Efficient Self-supervised Vision Transformers for Representation Learning

Arxiv

0+阅读 · 2022年7月6日

Weakly Supervised Grounding for VQA in Vision-Language Transformers

Arxiv

0+阅读 · 2022年7月5日

Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning

Arxiv

13+阅读 · 2021年4月7日

Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation

Arxiv

13+阅读 · 2020年7月3日

相关基金

基于单语语料的无监督统计机器翻译模型研究

国家自然科学基金

1+阅读 · 2013年12月31日

基于知识迁移的跨领域人体动作识别

国家自然科学基金

5+阅读 · 2013年12月31日

三维椭圆方程Cauchy问题的正则化方法

国家自然科学基金

0+阅读 · 2013年12月31日

基于原子相干的非线性级联效应控制光孤子传输

国家自然科学基金

0+阅读 · 2013年12月31日

利用多面体模块构建新型荧光材料基质及其对发光性能的调控研究

国家自然科学基金

0+阅读 · 2012年12月31日

微信扫码咨询专知VIP会员