Erroneous feature matches have severe impact on subsequent camera pose estimation and often require additional, time-costly measures, like RANSAC, for outlier rejection. Our method tackles this challenge by addressing feature matching and pose optimization jointly. To this end, we propose a graph attention network to predict image correspondences along with confidence weights. The resulting matches serve as weighted constraints in a differentiable pose estimation. Training feature matching with gradients from pose optimization naturally learns to down-weight outliers and boosts pose estimation on image pairs compared to SuperGlue by 6.7% on ScanNet. At the same time, it reduces the pose estimation time by over 50% and renders RANSAC iterations unnecessary. Moreover, we integrate information from multiple views by spanning the graph across multiple frames to predict the matches all at once. Multi-view matching combined with end-to-end training improves the pose estimation metrics on Matterport3D by 18.8% compared to SuperGlue.
翻译:外差特征匹配对随后的摄影机具有严重影响,从而产生估计,而且往往需要额外的、时间成本高的措施,如RANSAC, 以便排除异常点排斥。 我们的方法通过处理特征匹配和共同优化来应对这一挑战。 为此,我们提议了一个图形关注网络,以预测图像对应和信心重量。 由此产生的匹配在可变的面值估算中起着加权限制作用。 与从表面优化自然学习到下级外端和推力的梯度相匹配的培训特征对图像配对与ScanNet上超大Glue相比构成了6.7%的估计。 同时,它将面容值估计时间减少50%以上,使RANSAC重复变得没有必要。 此外,我们通过横跨多个框架的图形将多种观点的信息整合在一起,以便一次预测匹配。 多视角匹配与端对端培训相结合,可以使Meconport3D的外观估计指标比SuperGlue改进18.8%。