Detecting and predicting the behavior of pedestrians is extremely crucial for self-driving vehicles to plan and interact with them safely. Although there have been several research works in this area, it is important to have fast and memory efficient models such that it can operate in embedded hardware in these autonomous machines. In this work, we propose a novel architecture using spatial-temporal multi-tasking to do camera based pedestrian detection and intention prediction. Our approach significantly reduces the latency by being able to detect and predict all pedestrians' intention in a single shot manner while also being able to attain better accuracy by sharing features with relevant object level information and interactions.
翻译:检测和预测行人的行为对于自行驾驶的车辆安全地规划和与行人互动至关重要。虽然在这方面已经开展了一些研究工作,但重要的是要有快速和记忆高效的模型,使其能够在这些自主机器中用嵌入的硬件操作。在这项工作中,我们提出一个新的结构,利用空间时空多任务对行人进行摄影式探测和意图预测。我们的方法通过能够以单一镜头的方式探测和预测所有行人的意图,同时能够通过与相关目标级别信息和互动分享特征,从而提高准确性,从而大大降低了延时性。