连续视频流中联合发现和描述事件 (Joint Event Detection and Description in Continuous Video Streams)

Dense video captioning is a fine-grained video understanding task that involves two sub-problems: localizing distinct events in a long video stream, and generating captions for the localized events. We propose the Joint Event Detection and Description Network (JEDDi-Net), which solves the dense video captioning task in an end-to-end fashion. Our model continuously encodes the input video stream with three-dimensional convolutional layers, proposes variable-length temporal events based on pooled features, and generates their captions. Proposal features are extracted within each proposal segment through 3D Segment-of-Interest pooling from shared video feature encoding. In order to explicitly model temporal relationships between visual events and their captions in a single video, we also propose a two-level hierarchical captioning module that keeps track of context. On the large-scale ActivityNet Captions dataset, JEDDi-Net demonstrates improved results as measured by standard metrics. We also present the first dense captioning results on the TACoS-MultiLevel dataset.

翻译：高频视频字幕是一项细微的视频理解任务,涉及两个子问题:在长视频流中将不同事件本地化,并为本地化事件制作字幕。我们提议联合事件探测和描述网络(JEDDDI-Net),以端到端的方式解决密集视频字幕任务。我们的模型不断用三维共变相层将输入视频流编码成三维共变相层,根据集合特征提出可变长时间事件,并生成其字幕。提案的功能通过共享视频特征编码的3D Interest 共享集在每一个提案部分中提取。为了在单一视频中明确模拟视觉事件及其字幕之间的时间关系,我们还提议一个两级分级字幕模块,以跟踪背景。关于大型活动网络的数据集,JEDDDDi-Net展示了以标准度测量的更好结果。我们还在TACos-Multilevel数据集上展示了第一个密集字幕结果。

相关内容

Continuity

关注 0

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

【深度学习社区检测】Deep Learning for Community Detection: Progress, Challenges and Opportunities

专知会员服务

27+阅读 · 2020年6月13日

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

专知会员服务

64+阅读 · 2020年5月12日

【CVPR2020-Facebook】从检测到3D目标，FroDO: From Detections to 3D Objects

专知会员服务

30+阅读 · 2020年5月12日

【北卡罗莱纳州立大学】单场景视频异常检测综述，A Survey of Single-Scene Video Anomaly Detection

专知会员服务

29+阅读 · 2020年4月13日