Monocular depth estimation (MDE) provides a useful tool for robotic perception, but its predictions are often uncertain and inaccurate in challenging environments such as surgical scenes where textureless surfaces, specular reflections, and occlusions are common. To address this, we propose ProbeMDE, a cost-aware active sensing framework that combines RGB images with sparse proprioceptive measurements for MDE. Our approach utilizes an ensemble of MDE models to predict dense depth maps conditioned on both RGB images and on a sparse set of known depth measurements obtained via proprioception, where the robot has touched the environment in a known configuration. We quantify predictive uncertainty via the ensemble's variance and measure the gradient of the uncertainty with respect to candidate measurement locations. To prevent mode collapse while selecting maximally informative locations to propriocept (touch), we leverage Stein Variational Gradient Descent (SVGD) over this gradient map. We validate our method in both simulated and physical experiments on central airway obstruction surgical phantoms. Our results demonstrate that our approach outperforms baseline methods across standard depth estimation metrics, achieving higher accuracy while minimizing the number of required proprioceptive measurements.
翻译:单目深度估计(MDE)为机器人感知提供了有用工具,但其预测在具有挑战性的环境(如手术场景中常见的无纹理表面、镜面反射和遮挡)中往往存在不确定性和不准确性。为解决此问题,我们提出了ProbeMDE,一种成本感知的主动感知框架,将RGB图像与稀疏本体感知测量相结合用于MDE。我们的方法利用MDE模型集成,基于RGB图像以及通过本体感知获取的稀疏已知深度测量值(即机器人在已知配置下接触环境所得)来预测稠密深度图。我们通过模型集成的方差量化预测不确定性,并计算不确定性相对于候选测量位置的梯度。为避免模式坍塌,同时选择信息量最大的本体感知(接触)位置,我们基于该梯度图采用Stein变分梯度下降(SVGD)方法。我们在中央气道阻塞手术体模的仿真和物理实验中验证了所提方法。结果表明,该方法在标准深度估计指标上均优于基线方法,以更少的本体感知测量需求实现了更高的精度。