There is a growing interest in designing models that can deal with images from different visual domains. If there exists a universal structure in different visual domains that can be captured via a common parameterization, then we can use a single model for all domains rather than one model per domain. A model aware of the relationships between different domains can also be trained to work on new domains with less resources. However, to identify the reusable structure in a model is not easy. In this paper, we propose a multi-domain learning architecture based on depthwise separable convolution. The proposed approach is based on the assumption that images from different domains share cross-channel correlations but have domain-specific spatial correlations. The proposed model is compact and has minimal overhead when being applied to new domains. Additionally, we introduce a gating mechanism to promote soft sharing between different domains. We evaluate our approach on Visual Decathlon Challenge, a benchmark for testing the ability of multi-domain models. The experiments show that our approach can achieve the highest score while only requiring 50% of the parameters compared with the state-of-the-art approaches.
翻译:人们越来越有兴趣设计能够处理不同视觉域图像的模型。 如果在不同视觉域存在一个可以通过共同参数化捕获的通用结构, 那么我们可以对所有域使用单一模型, 而不是每个域使用一个模型。 了解不同域之间关系的模型也可以接受培训, 以便用较少的资源在新域上工作。 但是, 在模型中确定可重复使用的结构并不容易。 在本文中, 我们提议一个基于深度可分离的共振的多域学习结构。 提议的方法是基于以下假设: 来自不同域的图像可以共享跨频道相关性,但具有特定域的空间相关性。 提议的模型是紧凑的, 在应用到新域时只有最小的间接费用。 此外, 我们引入了促进不同域间软性共享的格子机制。 我们评估了我们关于视觉Decathlon挑战的方法, 这是测试多域模型能力的基准。 实验显示, 我们的方法可以达到最高分数, 而仅需要50%的参数与状态方法相比较。