Deep neural networks achieve remarkable performance in many computer vision tasks. Most state-of-the-art (SOTA) semantic segmentation and object detection approaches reuse neural network architectures designed for image classification as the backbone, commonly pre-trained on ImageNet. However, performance gains can be achieved by designing network architectures specifically for detection and segmentation, as shown by recent neural architecture search (NAS) research for detection and segmentation. One major challenge though is that ImageNet pre-training of the search space representation (a.k.a. super network) or the searched networks incurs huge computational cost. In this paper, we propose a Fast Network Adaptation (FNA++) method, which can adapt both the architecture and parameters of a seed network (e.g. an ImageNet pre-trained network) to become a network with different depths, widths, or kernel sizes via a parameter remapping technique, making it possible to use NAS for segmentation and detection tasks a lot more efficiently. In our experiments, we apply FNA++ on MobileNetV2 to obtain new networks for semantic segmentation, object detection, and human pose estimation that clearly outperform existing networks designed both manually and by NAS. We also implement FNA++ on ResNets and NAS networks, which demonstrates a great generalization ability. The total computation cost of FNA++ is significantly less than SOTA segmentation and detection NAS approaches: 1737x less than DPC, 6.8x less than Auto-DeepLab, and 8.0x less than DetNAS. A series of ablation studies are performed to demonstrate the effectiveness, and detailed analysis is provided for more insights into the working mechanism. Codes are available at https://github.com/JaminFong/FNA.
翻译:大部分最先进的艺术(SOTA)语义分解和对象检测方法都以重新利用神经网络结构来将图像分类为主干线,通常是在图像网络上预先培训的。然而,通过设计专门用于检测和分解的网络结构可以实现绩效增益,如最近的神经结构搜索(NAS)研究所显示的检测和分解。一个重大挑战是搜索空间代表(a.k.a.a.超级网络)或搜索网络的图像网络预培训将产生巨大的计算成本。在本文中,我们提出一种快速网络适应(GNA++)方法,用于将图像分类作为主干线,通常是在图像网络(例如图像网络预培训网络)的架构和参数进行改造,以便通过参数再映射技术,成为一个深度、宽度或内核大小不同的网络。在搜索空间代表(a.k.a.k.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.b.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.ax.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.