Online action detection has attracted increasing research interests in recent years. Current works model historical dependencies and anticipate the future to perceive the action evolution within a video segment and improve the detection accuracy. However, the existing paradigm ignores category-level modeling and does not pay sufficient attention to efficiency. Considering a category, its representative frames exhibit various characteristics. Thus, the category-level modeling can provide complimentary guidance to the temporal dependencies modeling. This paper develops an effective exemplar-consultation mechanism that first measures the similarity between a frame and exemplary frames, and then aggregates exemplary features based on the similarity weights. This is also an efficient mechanism, as both similarity measurement and feature aggregation require limited computations. Based on the exemplar-consultation mechanism, the long-term dependencies can be captured by regarding historical frames as exemplars, while the category-level modeling can be achieved by regarding representative frames from a category as exemplars. Due to the complementarity from the category-level modeling, our method employs a lightweight architecture but achieves new high performance on three benchmarks. In addition, using a spatio-temporal network to tackle video frames, our method makes a good trade-off between effectiveness and efficiency. Code is available at https://github.com/VividLe/Online-Action-Detection.
翻译:近年来,在线行动探测吸引了越来越多的研究兴趣。目前的工作模式是历史依赖,并预测未来,以在视频段内看到行动演变情况,提高检测准确性。然而,现有的范例忽略了类别层面的建模,没有足够重视效率。考虑到一个类别,其代表性框架具有各种特点。因此,类别级建模可以为时间依赖模型提供补充性指导。本文件开发了一个有效的模拟咨询机制,首先衡量框架和模范框架之间的相似性,然后根据相似度加权权重汇总一些示范性特征。这也是一个高效机制,因为相似度测量和特征汇总都需要有限的计算。基于外部协商机制,长期依赖性可以通过历史框架作为例外性来捕捉,而类别建模则可以通过从一个类别中代表性框架作为示例实现。由于分类级建模的互补性,我们的方法使用一个轻度结构,但在三个基准上达到新的高性性能。此外,利用在线协商机制,在可操作性贸易效率/互动网络之间,在可操作性框架/可操作性框架上,在可操作性贸易效率/互动网络之间,在可操作性/可操作性框架上,在可操作性/可操作性框架上进行。