Unsupervised monocular trained depth estimation models make use of adjacent frames as a supervisory signal during the training phase. However, temporally correlated frames are also available at inference time for many clinical applications, e.g., surgical navigation. The vast majority of monocular systems do not exploit this valuable signal that could be deployed to enhance the depth estimates. Those that do, achieve only limited gains due to the unique challenges in endoscopic scenes, such as low and homogeneous textures and inter-frame brightness fluctuations. In this work, we present SMUDLP, a novel and unsupervised paradigm for multi-frame monocular endoscopic depth estimation. The SMUDLP integrates a learnable patchmatch module to adaptively increase the discriminative ability in low-texture and homogeneous-texture regions, and enforces cross-teaching and self-teaching consistencies to provide efficacious regularizations towards brightness fluctuations. Our detailed experiments on both SCARED and Hamlyn datasets indicate that the SMUDLP exceeds state-of-the-art competitors by a large margin, including those that use single or multiple frames at inference time. The source code and trained models will be publicly available upon the acceptance.
翻译:在培训阶段,未经监督的单眼、经过训练的深度估计模型使用相邻框架作为监督信号,然而,许多临床应用,例如外科导航,在推断时,也存在与时间相关的框架。绝大多数单眼系统没有利用这个可用于提高深度估计的有价值的信号。由于内表场景的独特挑战,如低和同质质质质素和框架间亮度波动,这些模型只取得了有限的收益。在这项工作中,我们提出了SMUDLP,这是多框架单眼内径镜深度估计的新颖和不受监督的模式。SMUDLP综合了一个可学习的补配模块,以适应性地提高低文本和同质文本区域的歧视能力,并强制执行交叉教学和自我教学,其中包括提供对亮度波动的有效调节。我们在SCARRED和Hamlyn数据集中进行的详细实验表明,SMUDLP超过州级单眼内深层深度估计。SMUDP将纳入一个可学习的补齐模块,以适应性地提高低文和同质文系区域的歧视能力,并将在可公开接受的源码中使用的单一或多位模型。