魔鬼在海峡中: 高分图像分类的相互通道损失 (The Devil is in the Channels: Mutual-Channel Loss for Fine-Grained Image Classification)

Key for solving fine-grained image categorization is finding discriminate and local regions that correspond to subtle visual traits. Great strides have been made, with complex networks designed specifically to learn part-level discriminate feature representations. In this paper, we show it is possible to cultivate subtle details without the need for overly complicated network designs or training mechanisms -- a single loss is all it takes. The main trick lies with how we delve into individual feature channels early on, as opposed to the convention of starting from a consolidated feature map. The proposed loss function, termed as mutual-channel loss (MC-Loss), consists of two channel-specific components: a discriminality component and a diversity component. The discriminality component forces all feature channels belonging to the same class to be discriminative, through a novel channel-wise attention mechanism. The diversity component additionally constraints channels so that they become mutually exclusive on spatial-wise. The end result is therefore a set of feature channels that each reflects different locally discriminative regions for a specific class. The MC-Loss can be trained end-to-end, without the need for any bounding-box/part annotations, and yields highly discriminative regions during inference. Experimental results show our MC-Loss when implemented on top of common base networks can achieve state-of-the-art performance on all four fine-grained categorization datasets (CUB-Birds, FGVC-Aircraft, Flowers-102, and Stanford-Cars). Ablative studies further demonstrate the superiority of MC-Loss when compared with other recently proposed general-purpose losses for visual classification, on two different base networks. Code available at https://github.com/dongliangchang/Mutual-Channel-Loss

翻译：解决细微图像分类的关键在于找到歧视,而地方区域则与隐蔽的视觉特征相对应。已经取得了长足的进步,建立了专门为学习分层的分层差异性特征展示而设计的复杂网络。在本文中,我们表明有可能在不需要过于复杂的网络设计或培训机制的情况下培养微妙的细节 -- -- 只需要一个单一的损失即可。主要的技巧在于我们如何在早期进入单个特征频道,而不是从合并的地貌地图开始。拟议的损失功能,称为双通道损失(MC-Loss),由两个频道特有的组成部分组成:非犯罪性组成部分和多样性组成部分。非犯罪性组成部分迫使属于同一类的所有特征渠道通过一个新的频道关注机制具有歧视性。多样性构成额外的制约渠道,从而在空间方面相互排斥。因此,最终的结果是一系列的特征频道,每个都反映特定类别不同的地方歧视区域。MC-轨道的末端可以接受进一步的培训,而不需要任何约束式框/分层的描述和多样化的多样化组成部分。当我们普通的货币网络在显示普通的货币分类期间,可以产生高度歧视性的业绩。