通过超轻力语义分化的加工厂化自燃学习,实现高效环境整合 (Efficient Context Integration through Factorized Pyramidal Learning for Ultra-Lightweight Semantic Segmentation)

Semantic segmentation is a pixel-level prediction task to classify each pixel of the input image. Deep learning models, such as convolutional neural networks (CNNs), have been extremely successful in achieving excellent performances in this domain. However, mobile application, such as autonomous driving, demand real-time processing of incoming stream of images. Hence, achieving efficient architectures along with enhanced accuracy is of paramount importance. Since, accuracy and model size of CNNs are intrinsically contentious in nature, the challenge is to achieve a decent trade-off between accuracy and model size. To address this, we propose a novel Factorized Pyramidal Learning (FPL) module to aggregate rich contextual information in an efficient manner. On one hand, it uses a bank of convolutional filters with multiple dilation rates which leads to multi-scale context aggregation; crucial in achieving better accuracy. On the other hand, parameters are reduced by a careful factorization of the employed filters; crucial in achieving lightweight models. Moreover, we decompose the spatial pyramid into two stages which enables a simple and efficient feature fusion within the module to solve the notorious checkerboard effect. We also design a dedicated Feature-Image Reinforcement (FIR) unit to carry out the fusion operation of shallow and deep features with the downsampled versions of the input image. This gives an accuracy enhancement without increasing model parameters. Based on the FPL module and FIR unit, we propose an ultra-lightweight real-time network, called FPLNet, which achieves state-of-the-art accuracy-efficiency trade-off. More specifically, with only less than 0.5 million parameters, the proposed network achieves 66.93\% and 66.28\% mIoU on Cityscapes validation and test set, respectively. Moreover, FPLNet has a processing speed of 95.5 frames per second (FPS).

翻译：语义分解是一个像素级的预测任务, 用于对输入图像的每个像素进行分类。深层学习模型, 如卷进神经网络( CNN) 已经非常成功地实现了这一领域的优异性运行。但是, 移动应用, 如自动驱动、需求实时处理图像流。因此, 实现高效架构以及提高准确性至关重要。由于CNN的精度和模型大小本质上具有争议性, 挑战在于如何在精确度和模型大小之间实现一个体面的交换。为了解决这个问题, 我们提议了一个新型的加工厂化Pyramal Learning (FPL) 模块, 以高效的方式汇总丰富的背景信息。但是, 一方面, 它使用具有多种通缩率的动态过滤器库, 从而导致多级背景整合; 提高准确性。另一方面, 参数会因使用精密的过滤器的因子化而降低; 实现轻度模型的关键。此外, 我们只能将空间基质变变变变码, 分两个阶段, 使得一个简单高效的模块内集化, 来分别解深层精度精度的精度精度精度精度精度精度精度精度校化精度。,, 将精度变精度变精度变精度变精度变精度变精度变精度变精度变精度的FLLLLLLLLLIFIFIFIFIFI 的精度。