MEVA: 用于活动探测的大型多视角、多模式视频数据集 (MEVA: A Large-Scale Multiview, Multimodal Video Dataset for Activity Detection)

We present the Multiview Extended Video with Activities (MEVA) dataset, a new and very-large-scale dataset for human activity recognition. Existing security datasets either focus on activity counts by aggregating public video disseminated due to its content, which typically excludes same-scene background video, or they achieve persistence by observing public areas and thus cannot control for activity content. Our dataset is over 9300 hours of untrimmed, continuous video, scripted to include diverse, simultaneous activities, along with spontaneous background activity. We have annotated 144 hours for 37 activity types, marking bounding boxes of actors and props. Our collection observed approximately 100 actors performing scripted scenarios and spontaneous background activity over a three-week period at an access-controlled venue, collecting in multiple modalities with overlapping and non-overlapping indoor and outdoor viewpoints. The resulting data includes video from 38 RGB and thermal IR cameras, 42 hours of UAV footage, as well as GPS locations for the actors. 122 hours of annotation are sequestered in support of the NIST Activity in Extended Video (ActEV) challenge; the other 22 hours of annotation and the corresponding video are available on our website, along with an additional 306 hours of ground camera data, 4.6 hours of UAV data, and 9.6 hours of GPS logs. Additional derived data includes camera models geo-registering the outdoor cameras and a dense 3D point cloud model of the outdoor scene. The data was collected with IRB oversight and approval and released under a CC-BY-4.0 license.

翻译：我们展示了多视图扩展视频与活动(MEVA)数据集,这是一个用于人类活动识别的新的和非常大规模的数据基。现有的安全数据集要么侧重于活动计数,将由于其内容而传播的公开视频汇总起来,通常不包括同一摄像背景视频,或者通过观察公共区域实现持久性,因此无法控制活动内容。我们的数据集有9300多小时的未剪接、连续视频,脚本包括多种同时活动,以及自发的背景活动。我们为37种活动类型收集了144小时的附加说明,标记了一组行为者和道具。我们的数据收集显示,大约100个行为体在访问控制地点,用三个星期的时间里进行编造情景和自发背景活动,以多种方式收集相同背景视频,通常不包括相同的室内和户外背景视频。由此产生的数据包括38 RGB和热IR摄像头的视频,42小时的UAVAV视频,以及行为者的全球定位系统位置。我们收集了122小时的注释,用于支持NIST活动在扩展视频(ADEV)中进行标记。我们收集了大约22小时的越野视频和自动背景活动审批过程背景活动,另外22小时的计算,在SLSLSLSLSLSLA网站上的数据模型下提供了更多的数据。