Predicting the future motion of dynamic agents is of paramount importance to ensure safety or assess risks in motion planning for autonomous robots. In this paper, we propose a two-stage motion prediction method, referred to as R-Pred, that effectively utilizes both the scene and interaction context using a cascade of the initial trajectory proposal network and the trajectory refinement network. The initial trajectory proposal network produces M trajectory proposals corresponding to M modes of a future trajectory distribution. The trajectory refinement network enhances each of M proposals using 1) the tube-query scene attention (TQSA) and 2) the proposal-level interaction attention (PIA). TQSA uses tube-queries to aggregate the local scene context features pooled from proximity around the trajectory proposals of interest. PIA further enhances the trajectory proposals by modeling inter-agent interactions using a group of trajectory proposals selected based on their distances from neighboring agents. Our experiments conducted on the Argoverse and nuScenes datasets demonstrate that the proposed refinement network provides significant performance improvements compared to the single-stage baseline and that R-Pred achieves state-of-the-art performance in some categories of the benchmark.
翻译:预测动态主体的未来行动对于确保自主机器人的安全性或评估风险至关重要。本文提出了一种名为 R-Pred 的两阶段行动预测方法,有效利用场景和交互上下文,通过初始轨迹建议网络和轨迹细化网络的级联来实现。初始轨迹建议网络生成 M 条轨迹建议,对应于未来轨迹分布的 M 种模式。轨迹细化网络使用管状查询场景注意力(TQSA)和提议级交互注意力(PIA)来增强每个 M 建议中的轨迹。TQSA使用管状查询来聚合从感兴趣轨迹建议周围的邻近区域中汇集的局部现场上下文特征。PIA通过建议基于其与附近主体的距离来选择一组轨迹建议,进一步提高了轨迹建议的性能。我们在 Argoverse 数据集和 nuScenes 数据集上进行的实验表明,与单级基线相比,所提出的细化网络提供了显著的性能提升,并且在某些类别的基准测试中,R-Pred 实现了最先进的性能。