In this paper, we propose a general framework for image classification using the attention mechanism and global context, which could incorporate with various network architectures to improve their performance. To investigate the capability of the global context, we compare four mathematical models and observe the global context encoded in the category disentangled conditional generative model retains the richest complementary information to that in the baseline classification networks. Based on this observation, we define a novel Category Disentangled Global Context (CDGC) and devise a deep network to obtain it. By attending CDGC, the baseline networks could identify the objects of interest more accurately, thus improving the performance. We apply the framework to many different network architectures to demonstrate its effectiveness and versatility. Extensive results on five publicly available datasets validate our approach could generalize well and is superior to the state-of-the-art. In addition, the framework could be combined with various self-attention based methods to further promote the performance. Code and pretrained models will be made public upon paper acceptance.
翻译:在本文中,我们提出了一个利用关注机制和全球环境进行图像分类的一般框架,该框架可以与各种网络结构结合,以提高其性能。为了调查全球背景的能力,我们比较了四个数学模型,并观察了在分解的有条件基因模型中编码的类别中的全球背景,保留了与基线分类网络中最丰富的补充信息。根据这一观察,我们定义了一个新的分类,分解的全球背景(CDGC),并设计了一个深厚的网络来获取这一背景。参加CDGC,基准网络可以更准确地确定感兴趣的对象,从而改进绩效。我们将该框架应用于许多不同的网络结构,以证明其有效性和多功能。关于五套公开可得数据集的广泛结果证实我们的方法可以很好地概括,并且优于目前的状况。此外,框架可以与各种基于自我保护的方法相结合,以进一步促进绩效。在文件被接受后,将公布守则和预先培训的模式。