Ensuring safety in autonomous systems requires controllers that satisfy hard, state-wise constraints without relying on online interaction. While existing Safe Offline RL methods typically enforce soft expected-cost constraints, they do not guarantee forward invariance. Conversely, Control Barrier Functions (CBFs) provide rigorous safety guarantees but usually depend on expert-designed barrier functions or full knowledge of the system dynamics. We introduce Value-Guided Offline Control Barrier Functions (V-OCBF), a framework that learns a neural CBF entirely from offline demonstrations. Unlike prior approaches, V-OCBF does not assume access to the dynamics model; instead, it derives a recursive finite-difference barrier update, enabling model-free learning of a barrier that propagates safety information over time. Moreover, V-OCBF incorporates an expectile-based objective that avoids querying the barrier on out-of-distribution actions and restricts updates to the dataset-supported action set. The learned barrier is then used with a Quadratic Program (QP) formulation to synthesize real-time safe control. Across multiple case studies, V-OCBF yields substantially fewer safety violations than baseline methods while maintaining strong task performance, highlighting its scalability for offline synthesis of safety-critical controllers without online interaction or hand-engineered barriers.
翻译:确保自主系统的安全性需要控制器满足严格的逐状态约束,且不依赖在线交互。现有的安全离线强化学习方法通常强制执行软性期望成本约束,但无法保证前向不变性。相反,控制屏障函数(CBFs)提供了严格的安全保证,但通常依赖于专家设计的屏障函数或完整的系统动力学知识。我们提出了价值引导的离线控制屏障函数(V-OCBF),这是一个完全从离线演示中学习神经CBF的框架。与先前方法不同,V-OCBF不假设已知动力学模型;相反,它推导了一种递归有限差分屏障更新机制,实现了对随时间传播安全信息的屏障进行无模型学习。此外,V-OCBF采用基于期望分位数的目标函数,避免在分布外动作上查询屏障,并将更新限制在数据集支持的动作集合内。学习到的屏障随后与二次规划(QP)公式结合,用于合成实时安全控制。在多个案例研究中,V-OCBF相比基线方法显著减少了安全违规次数,同时保持了强大的任务性能,突显了其在无需在线交互或手工设计屏障的情况下,离线合成安全关键控制器的可扩展性。