通过体育建立可缩放的视频理解基准 (Building Scalable Video Understanding Benchmarks through Sports) - 专知论文

会员服务 ·

0

可理解性 · ASAP · Automator · state-of-the-art · 逼真度 ·

2023 年 1 月 17 日

Building Scalable Video Understanding Benchmarks through Sports

翻译：通过体育建立可缩放的视频理解基准

Aniket Agarwal,Alex Zhang,Karthik Narasimhan,Igor Gilitschenski,Vishvak Murahari,Yash Kant

Existing benchmarks for evaluating long video understanding falls short on multiple aspects, either lacking in scale or quality of annotations. These limitations arise from the difficulty in collecting dense annotations for long videos (e.g. actions, dialogues, etc.), which are often obtained by manually labeling many frames per second. In this work, we introduce an automated Annotation and Video Stream Alignment Pipeline (abbreviated ASAP). We demonstrate the generality of ASAP by aligning unlabeled videos of four different sports (Cricket, Football, Basketball, and American Football) with their corresponding dense annotations (i.e. commentary) freely available on the web. Our human studies indicate that ASAP can align videos and annotations with high fidelity, precision, and speed. We then leverage ASAP scalability to create LCric, a large-scale long video understanding benchmark, with over 1000 hours of densely annotated long Cricket videos (with an average sample length of 50 mins) collected at virtually zero annotation cost. We benchmark and analyze state-of-the-art video understanding models on LCric through a large set of compositional multi-choice and regression queries. We establish a human baseline that indicates significant room for new research to explore.

翻译：评估长期视频理解的现有基准在多个方面都不足,要么缺乏规模或说明质量,这些限制源于难以收集长视频(如行动、对话等)的密集说明,这些说明往往通过手工为每秒多框架贴上手动标签获得。在这项工作中,我们引入了自动注解和视频流调整管道(ASAP),通过将收集的四种不同运动(板球、足球、篮球和美国足球)的无标签视频与相应的密集说明(即评论)统一起来,在网络上免费查阅。我们的人类研究表明,ASAP能够将视频和说明与高度忠诚、精确和快速地统一起来。然后,我们利用ASAP的可缩放性来创建LCric,一个大型的长视频理解基准,1 000多小时以近乎零度注解算成本收集的粗长的Cricket视频(平均样本长度为50分钟)。我们通过大量多人基底的多层研究,对LCric的图像进行基准和分析。

0

相关内容

可理解性

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

钝顶螺旋藻中β-胡萝卜素转化合成虾青素及其抗氧化功能研究

国家自然科学基金

0+阅读 · 2014年12月31日

具有荧光成像功能磁共振成像造影剂的合成及作为药物靶向制剂的研究

国家自然科学基金

0+阅读 · 2013年12月31日

2-O-β-D-葡萄糖基-L-抗坏血酸及其类似物的合成和抗氧化性研究

国家自然科学基金

0+阅读 · 2013年12月31日

低交叉极化共形天线阵列综合的混合DE算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

ING3：原发性肝癌的诊断与治疗新靶点

国家自然科学基金

0+阅读 · 2012年12月31日

高效率单体共时双频Doherty功率放大器设计及其预失真行为模型研究

国家自然科学基金

0+阅读 · 2012年12月31日

补肺健脾方调控COPD大鼠骨骼肌能量代谢和细胞凋亡的研究

国家自然科学基金

0+阅读 · 2011年12月31日

路易体痴呆脑脊液生物标志及异常睡眠脑电分析

国家自然科学基金

0+阅读 · 2009年12月31日

金属与有机小分子共催化合成几类环状化合物

国家自然科学基金

0+阅读 · 2009年12月31日

Pincer型环金属化合物小分子凝胶剂的合成及其应用

国家自然科学基金

0+阅读 · 2009年12月31日

Open-Ended Medical Visual Question Answering Through Prefix Tuning of Language Models

Arxiv

0+阅读 · 2023年3月10日

DaXBench: Benchmarking Deformable Object Manipulation with Differentiable Physics

Arxiv

0+阅读 · 2023年3月10日

ImagineNET: Target Speaker Extraction with Intermittent Visual Cue through Embedding Inpainting

Arxiv

0+阅读 · 2023年3月9日

Grounding Language with Visual Affordances over Unstructured Data

Arxiv

0+阅读 · 2023年3月8日

Towards Reasoning in Large Language Models: A Survey

Arxiv

34+阅读 · 2022年12月20日

Multi-Task Learning for Visual Scene Understanding

Arxiv

29+阅读 · 2022年3月28日

Natural Language Descriptions of Deep Visual Features

Arxiv

12+阅读 · 2022年1月26日

Cross-Modal Object Tracking: Modality-Aware Representations and A Unified Benchmark

Arxiv

14+阅读 · 2021年11月11日

Heterogeneous Network Representation Learning: A Unified Framework with Survey and Benchmark

Heterogeneous Network Representation Learning: A Unified Framework with Survey and Benchmark

Arxiv

19+阅读 · 2020年12月17日

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

Arxiv

19+阅读 · 2020年2月15日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

新型数字杀伤链：理解综合战术网络对野战炮兵体系的能力与效益

《对抗环境中运用数字孪生技术优化预测性维护与后勤保障》2025最新93页

《任务式指挥十六个案例研究》232页

《幻觉还是事实：国防大型语言模型的可信度评估研究》2025最新109页

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

相关论文

Open-Ended Medical Visual Question Answering Through Prefix Tuning of Language Models

Arxiv

0+阅读 · 2023年3月10日

DaXBench: Benchmarking Deformable Object Manipulation with Differentiable Physics

Arxiv

0+阅读 · 2023年3月10日

ImagineNET: Target Speaker Extraction with Intermittent Visual Cue through Embedding Inpainting

Arxiv

0+阅读 · 2023年3月9日

Grounding Language with Visual Affordances over Unstructured Data

Arxiv

0+阅读 · 2023年3月8日

Towards Reasoning in Large Language Models: A Survey

Arxiv

34+阅读 · 2022年12月20日

Multi-Task Learning for Visual Scene Understanding

Arxiv

29+阅读 · 2022年3月28日

Natural Language Descriptions of Deep Visual Features

Arxiv

12+阅读 · 2022年1月26日

Cross-Modal Object Tracking: Modality-Aware Representations and A Unified Benchmark

Arxiv

14+阅读 · 2021年11月11日

Heterogeneous Network Representation Learning: A Unified Framework with Survey and Benchmark

Heterogeneous Network Representation Learning: A Unified Framework with Survey and Benchmark

Arxiv

19+阅读 · 2020年12月17日

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

Arxiv

19+阅读 · 2020年2月15日

相关基金

钝顶螺旋藻中β-胡萝卜素转化合成虾青素及其抗氧化功能研究

国家自然科学基金

0+阅读 · 2014年12月31日

具有荧光成像功能磁共振成像造影剂的合成及作为药物靶向制剂的研究

国家自然科学基金

0+阅读 · 2013年12月31日

2-O-β-D-葡萄糖基-L-抗坏血酸及其类似物的合成和抗氧化性研究

国家自然科学基金

0+阅读 · 2013年12月31日

低交叉极化共形天线阵列综合的混合DE算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

ING3：原发性肝癌的诊断与治疗新靶点

国家自然科学基金

0+阅读 · 2012年12月31日

高效率单体共时双频Doherty功率放大器设计及其预失真行为模型研究

国家自然科学基金

0+阅读 · 2012年12月31日

补肺健脾方调控COPD大鼠骨骼肌能量代谢和细胞凋亡的研究

国家自然科学基金

0+阅读 · 2011年12月31日

路易体痴呆脑脊液生物标志及异常睡眠脑电分析

国家自然科学基金

0+阅读 · 2009年12月31日

金属与有机小分子共催化合成几类环状化合物

国家自然科学基金

0+阅读 · 2009年12月31日

Pincer型环金属化合物小分子凝胶剂的合成及其应用

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员