【泡泡图灵智库】Flowdometry：基于光流和深度学习的视觉里程计（IWCACV-1）

会员服务 ·

【泡泡图灵智库】Flowdometry：基于光流和深度学习的视觉里程计（IWCACV-1）

2018 年 9 月 7 日 泡泡机器人SLAM

泡泡图灵智库，带你精读机器人顶级会议文章

标题：Flowdometry : An Optical Flow and Deep Learning Based Approach to Visual Odometry

作者：Peter Muller，Andreas Savakis

来源：2017 IEEE Winter Conference on Applications of Computer Vision

播音员：清蒸鱼

编译：皮燕燕

审核：杨小育

欢迎个人转发朋友圈；其他机构或自媒体如需转载，后台留言申请授权

摘要

大家好，今天为大家带来的文章是——Flowdometry：基于光流和深度学习的视觉里程计，该文章发表于2017 IEEE Winter Conference on Applications of Computer Vision。

视觉里程计实现了由视觉数据流产生地图的功能，其是同时定位与建图（SLAM）技术中一项具有挑战性的任务。视觉里程计一般采用一个或两个相机，根据所采集到的图像之间的特征和像素差异来估计运动。由于摄像机的帧速率通常较小，可以假设连续帧之间的增量（即光流）与中心参考（例如车辆上的摄像机）移动的物理距离成比例。该论文提出了一种基于光流和深度学习的视觉里程计系统，并命名为Flowdometry。在该系统中，光流图像用作卷积神经网络的输入，计算得到每个图像像素的旋转和位移量。然后利用计算得到的位移量和旋转量以构建摄像机行进的地图。文章在KITTI视觉里程计数据集上进行训练和测试，并且通过实际路线与预测驾驶轨迹之间的距离差来衡量精度。文章测试了不同卷积神经网络架构配置下的准确性，然后将结果与使用相同数据集的其他最先进的单目里程计系统进行比较。Flowdometry系统的平均位移误差为10.77％，平均旋转误差为0.0623°/米。每个光流帧总执行时间为0.633秒，与使用深度学习的最先进方法相比，速度提高了23.796倍。

主要贡献

主要贡献有：

1、本文为FlowNetS启发的视觉里程计方案提供了端到端的深度学习解决方案。

2 、本文提出的系统利用原始光流作为在更改的神经网络架构的输入，不需要半监督预训练。

3 、本文提出的系统大大缩短了从原始相机数据到获得里程计结果的执行时间。

算法流程

图1 Flowdometry 系统框图

a、先基于FlowNetS网络计算输入帧的光流。

b、将上步得到的光流再次输入到FlowNetS网络，以产生帧间里程计估计。

3、将里程计的估计结果累加起来得到准确的地图。

主要结果

1、数据集

如表1所示，为KITTI里程计基准中每个视频序列的帧数。添加镜像序列使系统可用的训练数据量加倍。

表1 训练所采用的数据集

2、评估

Flowdometry的平均误差结果，以及与VISO2-M，SVR VO和P-CNN的比较，如表2所示。

表2：不同视觉里程计方法获得的每个视频序列的平移误差和旋转误差。

里程计基准对应的旋转和平移误差是序列长度和车辆速度的函数。图2显示了所有测试序列的平均值的误差和误差原因的关系图。

（a）平均旋转误差与视频序列长度的关系

（b）平均旋转误差与车速的关系

c）平均平移误差与视频序列长度的关系

d）平均平移误差与车速的关系

图2 所有序列的平均误差

为了证明Flowdometry系统在训练时未发生过拟合，并具备一定的泛化能力，图3显示了来自序列00（训练序列之一）的示例图。

图3 与地面实际路径相比的序列00里程计估计结果。该序列是Flowdometry系统的训练序列之一，结果表明系统没有过度拟合训练数据。

3、运算时间

表3显示了每种方法之间运算时间的比较。

表3：每种方法的运算时间。表中最后一列表示与每种方法的总执行时间相比，Flowdometry的总执行时间要快多少。

Abstract

Visual odometry is a challenging task related to simultaneous localization and mapping that aims to generate a map traveled from a visual data stream. Based on one or two cameras, motion is estimated from features and pixel

differences between frames. Because of the frame rate of the cameras, there are generally small, incremental changes between subsequent frames where optical flow can be assumed to be proportional to the physical distance moved by an egocentric reference, such as a camera on a vehicle. In this paper, a visual odometry system called Flowdometry is

proposed based on optical flow and deep learning. Optical flow images are used as input to a convolutional neural network, which calculates a rotation and displacement for each

image pixel. The displacements and rotations are applied incrementally to construct a map of where the camera has traveled. The proposed system is trained and tested on the KITTI visual odometry dataset, and accuracy is measured

by the difference in distances between ground truth and predicted driving trajectories. Different convolutional neural network architecture configurations are tested for accuracy, and then results are compared to other state-of-the-art

monocular odometry systems using the same dataset. The average translation error from the Flowdometry system is 10.77% and the average rotation error is 0.0623 degrees per meter. The total execution time of the system per optical

flow frame is 0.633 seconds, which offers a 23.796x speedup over state-of-the-art methods using deep learning.

如果你对本文感兴趣，想要下载完整文章进行阅读，可以关注【泡泡机器人SLAM】公众号。

点击阅读原文，即可获取本文下载链接。

欢迎来到泡泡论坛，这里有大牛为你解答关于SLAM的任何疑惑。

有想问的问题，或者想刷帖回答问题，泡泡论坛欢迎你！

泡泡网站：www.paopaorobot.org

泡泡论坛：http://paopaorobot.org/forums/