We present a self-supervised approach to training convolutional neural networks for dense depth estimation from monocular endoscopy data without a priori modeling of anatomy or shading. Our method only requires monocular endoscopic video and a multi-view stereo method, e.g. structure from motion, to supervise learning in a sparse manner. Consequently, our method requires neither manual labeling nor patient computed tomography (CT) scan in the training and application phases. In a cross-patient experiment using CT scans as groundtruth, the proposed method achieved submillimeter root mean squared error. In a comparison study to a recent self-supervised depth estimation method designed for natural video on in vivo sinus endoscopy data, we demonstrate that the proposed approach outperforms the previous method by a large margin. The source code for this work is publicly available online at https://github.com/lppllppl920/EndoscopyDepthEstimation-Pytorch.
翻译:我们提出一种自我监督的方法,用于培训进化神经网络,在无需先验的解剖或阴影模型的情况下,利用单眼内窥镜数据进行密集深度估计。我们的方法仅需要单眼内窥镜视频和多视立体方法,例如从运动结构上监督学习,因此,在培训和应用阶段,我们的方法既不要求人工标签,也不要求病人进行人工计算断层扫描。在使用CT扫描作为地面图的跨病人实验中,拟议方法达到了亚毫米根根平均正方形错误。在对最近为活性脊髓内镜数据中自然视频设计的自我监督深度估计方法进行的比较研究中,我们证明拟议方法大大超过先前的方法。这项工作的源代码可在https://github.com/lppllppl920/Endoscopicopic DepehEstimation-Pyturch上公开查阅。