Multi-task learning is commonly used in autonomous driving for solving various visual perception tasks. It offers significant benefits in terms of both performance and computational complexity. Current work on multi-task learning networks focus on processing a single input image and there is no known implementation of multi-task learning handling a sequence of images. In this work, we propose a multi-stream multi-task network to take advantage of using feature representations from preceding frames in a video sequence for joint learning of segmentation, depth, and motion. The weights of the current and previous encoder are shared so that features computed in the previous frame can be leveraged without additional computation. In addition, we propose to use the geometric mean of task losses as a better alternative to the weighted average of task losses. The proposed loss function facilitates better handling of the difference in convergence rates of different tasks. Experimental results on KITTI, Cityscapes and SYNTHIA datasets demonstrate that the proposed strategies outperform various existing multi-task learning solutions.
翻译:多任务学习通常用于自主驱动,解决各种视觉认知任务,在性能和计算复杂性方面都具有重大好处。当前多任务学习网络的工作侧重于处理单一输入图像,没有已知的多任务学习处理图像序列。在这项工作中,我们提议多流多任务学习网络,以利用前框架的特征表现在视频序列中,用于共同学习分化、深度和运动。共享当前和以前的编码器的权重,以便在不进行额外计算的情况下,利用前框架中计算出的特征。此外,我们提议使用任务损失的几何平均值,作为加权平均任务损失的更好替代。拟议的损失功能有助于更好地处理不同任务趋同率的差异。关于KITTI、城市景景和SYNTHIA数据集的实验结果显示,拟议的战略超越了现有的多种多任务学习解决方案。