Accurate Monocular Depth Estimation (MDE) is critical for robotic surgery but remains fragile in specular, fluid-filled endoscopic environments. Existing self-supervised methods, typically relying on foundation models trained with noisy real-world pseudo-labels, often suffer from boundary collapse on thin surgical tools and transparent surfaces. In this work, we address this by leveraging the high-fidelity synthetic priors of the Depth Anything V2 architecture, which inherently captures precise geometric details of thin structures. We efficiently adapt these priors to the medical domain using Dynamic Vector Low-Rank Adaptation (DV-LORA), minimizing the parameter budget while bridging the synthetic-to-real gap. Additionally, we introduce a physically-stratified evaluation protocol on the SCARED dataset to rigorously quantify performance in high-specularity regimes often masked by aggregate metrics. Our approach establishes a new state-of-the-art, achieving an accuracy (< 1.25) of 98.1% and reducing Squared Relative Error by over 17% compared to established baselines, demonstrating superior robustness in adverse surgical lighting.
翻译:精确的单目深度估计(MDE)对于机器人手术至关重要,但在充满镜面反射和液体的内窥镜环境中仍然非常脆弱。现有的自监督方法通常依赖于使用带噪声的真实世界伪标签训练的基础模型,常常在薄型手术器械和透明表面上出现边界塌陷。在本工作中,我们通过利用Depth Anything V2架构的高保真合成先验来解决这一问题,该架构本质上能捕捉薄型结构的精确几何细节。我们使用动态向量低秩自适应(DV-LORA)高效地将这些先验适应到医疗领域,在弥合合成数据与真实数据之间差距的同时,最大限度地减少了参数量。此外,我们在SCARED数据集上引入了一种物理分层评估协议,以严格量化在通常被聚合指标掩盖的高镜面反射区域中的性能。我们的方法确立了新的最先进水平,与现有基线相比,准确率(< 1.25)达到98.1%,并将平方相对误差降低了超过17%,证明了其在不利手术光照条件下具有卓越的鲁棒性。