The Non-Local Network (NLNet) presents a pioneering approach for capturing long-range dependencies within an image, via aggregating query-specific global context to each query position. However, through a rigorous empirical analysis, we have found that the global contexts modeled by the non-local network are almost the same for different query positions. In this paper, we take advantage of this finding to create a simplified network based on a query-independent formulation, which maintains the accuracy of NLNet but with significantly less computation. We further replace the one-layer transformation function of the non-local block by a two-layer bottleneck, which further reduces the parameter number considerably. The resulting network element, called the global context (GC) block, effectively models global context in a lightweight manner, allowing it to be applied at multiple layers of a backbone network to form a global context network (GCNet). Experiments show that GCNet generally outperforms NLNet on major benchmarks for various recognition tasks. The code and network configurations are available at https://github.com/xvjiarui/GCNet.
翻译:非本地网络(NLNet)通过对每个查询位置进行严格的实证分析,为在图像中捕捉长期依赖性提供了一个开拓性办法,将特定查询的全球背景汇总到每个查询位置,从而在图像中捕捉长期依赖性。然而,通过严格的实证分析,我们发现由非本地网络建模的全球背景对不同的查询位置几乎是一样的。在本文件中,我们利用这一调查结果,创建了一个基于自问配方的简化网络,该配方保持了NLNet的准确性,但计算量要少得多。我们进一步用一个双层瓶颈取代了非本地块的单层转换功能,这进一步大大减少了参数数量。由此形成的网络元素称为全球环境(GC)块,以轻量度方式有效地模拟了全球背景,使其能够在主干网的多层上应用,形成一个全球背景网络(GCNet)。实验显示,GCNet通常在各种识别任务的主要基准上优于NLNet。代码和网络配置见https://github.com/xvjiariui/GCNet。