Zero-shot action recognition is the task of classifying action categories that are not available in the training set. In this setting, the standard evaluation protocol is to use existing action recognition datasets (e.g. UCF101) and randomly split the classes into seen and unseen. However, most recent work builds on representations pre-trained on the Kinetics dataset, where classes largely overlap with classes in the zero-shot evaluation datasets. As a result, classes which are supposed to be unseen, are present during supervised pre-training, invalidating the condition of the zero-shot setting. A similar concern was previously noted several years ago for image based zero-shot recognition, but has not been considered by the zero-shot action recognition community. In this paper, we propose a new split for true zero-shot action recognition with no overlap between unseen test classes and training or pre-training classes. We benchmark several recent approaches on the proposed True Zero-Shot (TruZe) Split for UCF101 and HMDB51, with zero-shot and generalized zero-shot evaluation. In our extensive analysis we find that our TruZe splits are significantly harder than comparable random splits as nothing is leaking from pre-training, i.e. unseen performance is consistently lower, up to 9.4% for zero-shot action recognition. In an additional evaluation we also find that similar issues exist in the splits used in few-shot action recognition, here we see differences of up to 14.1%. We publish our splits and hope that our benchmark analysis will change how the field is evaluating zero- and few-shot action recognition moving forward.
翻译:零点行动识别是将训练数据集中不具备的行动类别进行分类的任务。在这一设置中,标准评价协议是使用现有的行动识别数据集(例如,UCF101),随机地将分类分为可见和看不见的类别。然而,最近的工作以动因数据集上预先培训的演示为基础,其中各类基本上与零点评价数据集中的各类重叠。因此,在受监督的培训前阶段存在本应是看不见的类别,使零点点设置的条件失效。几年前曾对基于零点点点的图像识别表示过类似的关切,但零点行动识别界尚未考虑过这种关切。在本文中,我们建议对真实零点行动识别进行新的分割,而未在隐性测试班和训练前班之间出现重叠。我们把最近提出的几个问题作为基准,即UCFO-Shot(TruZe) 和HMDB51的分解分解方法,以及零点设定的分解和普遍零点评估。我们的广泛分析发现,我们从零点的分解的分解的分解的实地评估中发现,我们一直以来的分解的分解的分辨行动是更难的分辨的分辨的分辨的分解。