Imitation learning (IL) enables autonomous behavior by learning from expert demonstrations. While more sample-efficient than comparative alternatives like reinforcement learning, IL is sensitive to compounding errors induced by distribution shifts. There are two significant sources of distribution shifts when using IL-based feedback laws on systems: distribution shifts caused by policy error and distribution shifts due to exogenous disturbances and endogenous model errors due to lack of learning. Our previously developed approaches, Taylor Series Imitation Learning (TaSIL) and $\mathcal{L}_1$ -Distributionally Robust Adaptive Control (\ellonedrac), address the challenge of distribution shifts in complementary ways. While TaSIL offers robustness against policy error-induced distribution shifts, \ellonedrac offers robustness against distribution shifts due to aleatoric and epistemic uncertainties. To enable certifiable IL for learned and/or uncertain dynamical systems, we formulate \textit{Distributionally Robust Imitation Policy (DRIP)} architecture, a Layered Control Architecture (LCA) that integrates TaSIL and~\ellonedrac. By judiciously designing individual layer-centric input and output requirements, we show how we can guarantee certificates for the entire control pipeline. Our solution paves the path for designing fully certifiable autonomy pipelines, by integrating learning-based components, such as perception, with certifiable model-based decision-making through the proposed LCA approach.
翻译:模仿学习(IL)通过从专家示范中学习实现自主行为。虽然相较于强化学习等替代方法更具样本效率,但IL对分布偏移引发的累积误差较为敏感。在系统中使用基于IL的反馈律时,存在两个重要的分布偏移来源:由策略误差引起的分布偏移,以及由外生扰动和因学习不足导致的内生模型误差引起的分布偏移。我们先前提出的方法——泰勒级数模仿学习(TaSIL)和$\mathcal{L}_1$分布鲁棒自适应控制(\ellonedrac)——以互补的方式应对分布偏移的挑战。TaSIL能够抵御策略误差引起的分布偏移,而\ellonedrac则能抵御由偶然性和认知性不确定性导致的分布偏移。为实现对学习型和/或不确定动态系统的可认证IL,我们提出了\textit{分布鲁棒模仿策略(DRIP)}架构,这是一种集成TaSIL与~\ellonedrac的分层控制架构(LCA)。通过审慎设计各层中心的输入输出要求,我们展示了如何为整个控制流程提供可证明的保证。该解决方案通过所提出的LCA方法,将基于学习的组件(如感知)与可认证的基于模型的决策相结合,为设计完全可认证的自主系统流程开辟了道路。