MVFNet:促进高效视频识别的多查看聚合网络 (MVFNet: Multi-View Fusion Network for Efficient Video Recognition)

Conventionally, spatiotemporal modeling network and its complexity are the two most concentrated research topics in video action recognition. Existing state-of-the-art methods have achieved excellent accuracy regardless of the complexity meanwhile efficient spatiotemporal modeling solutions are slightly inferior in performance. In this paper, we attempt to acquire both efficiency and effectiveness simultaneously. First of all, besides traditionally treating H x W x T video frames as space-time signal (viewing from the Height-Width spatial plane), we propose to also model video from the other two Height-Time and Width-Time planes, to capture the dynamics of video thoroughly. Secondly, our model is designed based on 2D CNN backbones and model complexity is well kept in mind by design. Specifically, we introduce a novel multi-view fusion (MVF) module to exploit video dynamics using separable convolution for efficiency. It is a plug-and-play module and can be inserted into off-the-shelf 2D CNNs to form a simple yet effective model called MVFNet. Moreover, MVFNet can be thought of as a generalized video modeling framework and it can specialize to be existing methods such as C2D, SlowOnly, and TSM under different settings. Extensive experiments are conducted on popular benchmarks (i.e., Something-Something V1 & V2, Kinetics, UCF-101, and HMDB-51) to show its superiority. The proposed MVFNet can achieve state-of-the-art performance with 2D CNN's complexity.

翻译：首先,我们试图同时获得效率和有效性。首先,除了传统地将H x W x T视频框作为时空信号(从Hight-Width空间平面上观看)外,我们还提议从另外两架高度时空和Width时空飞机上模拟视频,以彻底捕捉视频的动态。第二,我们的模型是以2DCNN的骨干和模型复杂性设计的。具体地说,我们引入了一个新的多视聚变模块,以利用视频动态,同时将之作为时间信号(从Height-Width空间平面上观看),我们提议从另外两架高度时空和Width时空飞机上模拟视频,以便彻底捕捉视频的动态。此外,MVFNet可以想象,基于2DCNN的骨干和模型复杂度设计。具体地说,我们引入了一个新的多视聚合模块(MVF)模块来利用视频动态作为空间时空信号(从H x x ) 。这是一个插和游戏模块,可以插入到离场的2DCNN, 来形成一个简单而有效的模式。MVFNet。此外,MFNet可以想象到一个通用的通用的模型框架, 和S-S-S-S-S-S-s-s-s-modrode-lax-lax-lax-laxxxxxxxxx-s-s-s-s-s-s-s-S-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s

相关内容

MoDELS

关注 30

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/