Neural Networks require large amounts of memory and compute to process high resolution images, even when only a small part of the image is actually informative for the task at hand. We propose a method based on a differentiable Top-K operator to select the most relevant parts of the input to efficiently process high resolution images. Our method may be interfaced with any downstream neural network, is able to aggregate information from different patches in a flexible way, and allows the whole model to be trained end-to-end using backpropagation. We show results for traffic sign recognition, inter-patch relationship reasoning, and fine-grained recognition without using object/part bounding box annotations during training.
翻译:神经网络需要大量的内存和计算来处理高分辨率图像, 即使图像中只有一小部分实际上对手头的任务具有信息意义。 我们建议一种基于不同的顶端操作员选择输入中最相关部分的方法, 以高效处理高分辨率图像。 我们的方法可以与下游神经网络连接, 能够灵活地将来自不同处的信息汇总起来, 并且允许整个模型使用后方对端的配置来接受培训 。 我们展示了交通信号识别、 交叉连接关系推理和精细加分识别的结果, 而无需在培训中使用对象/ 部分约束框说明 。