Stereo vision is an effective technique for depth estimation with broad applicability in autonomous urban and highway driving. While various deep learning-based approaches have been developed for stereo, the input data from a binocular setup with a fixed baseline are limited. Addressing such a problem, we present an end-to-end network for processing the data from a trinocular setup, which is a combination of a narrow and a wide stereo pair. In this design, two pairs of binocular data with a common reference image are treated with shared weights of the network and a mid-level fusion. We also propose a Guided Addition method for merging the 4D data of the two baselines. Additionally, an iterative sequential self-supervised and supervised learning on real and synthetic datasets is presented, making the training of the trinocular system practical with no need to ground-truth data of the real dataset. Experimental results demonstrate that the trinocular disparity network surpasses the scenario where individual pairs are fed into a similar architecture. Code and dataset: https://github.com/cogsys-tuebingen/tristereonet.
翻译:虽然为立体器开发了各种深层次的学习方法,但具有固定基线的双筒望远镜的输入数据是有限的。解决了这个问题,我们提出了一个端对端网络,用于处理来自三角形装置的数据,这是一个狭窄和宽的立体装置的组合。在这个设计中,两对具有共同参考图像的双筒望远镜数据用网络的共享重量和一个中层聚合处理。我们还提出了合并两个基线的4D数据的向导附加方法。此外,还介绍了一个迭代顺序的自我监督和监督的关于真实和合成数据集的学习,使三筒系统的培训切实可行,而不需要真实数据集的地面图象数据。实验结果表明,三筒悬射网络超过了单个对立体被注入类似结构的情景。代码和数据集:http://github.com/cogsy-tuebingen/tristereet。