Recent advancements in the perception for autonomous driving are driven by deep learning. In order to achieve the robust and accurate scene understanding, autonomous vehicles are usually equipped with different sensors (e.g. cameras, LiDARs, Radars), and multiple sensing modalities can be fused to exploit their complementary properties. In this context, many methods have been proposed for deep multi-modal perception problems. However, there is no general guideline for network architecture design, and questions of "what to fuse", "when to fuse", and "how to fuse" remain open. This review paper attempts to systematically summarize methodologies and discuss challenges for deep multi-modal object detection and semantic segmentation in autonomous driving. To this end, we first provide an overview of on-board sensors on test vehicles, open datasets and the background information of object detection and semantic segmentation for the autonomous driving research. We then summarize the fusion methodologies and discuss challenges and open questions. In the appendix, we provide tables that summarize topics and methods. We also provide an interactive online platform to navigate each reference: https://multimodalperception.github.io.
翻译:最近自主驾驶观念的进步是由深层学习推动的。为了实现稳健和准确的场面理解,自主车辆通常配备不同的传感器(例如照相机、激光雷达、雷达),并且可以结合多种感测方式来利用它们互补的特性。在这方面,为深层的多模式感知问题提出了许多方法。然而,网络结构设计没有一般准则,“什么是引信”、“何时引信”和“如何引信”的问题仍然开放。本审查文件试图系统地总结方法和讨论深度多式物体探测和自主驾驶中语义分解的挑战。为此,我们首先对测试车辆上的机载传感器、开放数据集以及自动驾驶研究对象探测和语义分解的背景资料进行综述。我们然后总结聚变集方法,讨论挑战和开放问题。在附录中,我们提供了总结专题和方法的表格。我们还提供了一个交互式在线平台,以引导每一种参考:https://motmodalpercepion.githubio。