Reinforcement learning (RL) is a powerful framework for optimizing decision-making in complex systems under uncertainty, an essential challenge in real-world settings, particularly in the context of the energy transition. A representative example is remote microgrids that supply power to communities disconnected from the main grid. Enabling the energy transition in such systems requires coordinated control of renewable sources like wind turbines, alongside fuel generators and batteries, to meet demand while minimizing fuel consumption and battery degradation under exogenous and intermittent load and wind conditions. These systems must often conform to extensive regulations and complex operational constraints. To ensure that RL agents respect these constraints, it is crucial to provide interpretable guarantees. In this paper, we introduce Shielded Controller Units (SCUs), a systematic and interpretable approach that leverages prior knowledge of system dynamics to ensure constraint satisfaction. Our shield synthesis methodology, designed for real-world deployment, decomposes the environment into a hierarchical structure where each SCU explicitly manages a subset of constraints. We demonstrate the effectiveness of SCUs on a remote microgrid optimization task with strict operational requirements. The RL agent, equipped with SCUs, achieves a 24% reduction in fuel consumption without increasing battery degradation, outperforming other baselines while satisfying all constraints. We hope SCUs contribute to the safe application of RL to the many decision-making challenges linked to the energy transition.
翻译:强化学习(RL)是在不确定性下优化复杂系统决策的强大框架,这是现实世界场景中的关键挑战,尤其在能源转型背景下。一个典型示例是为脱离主电网的社区供电的远程微电网。在此类系统中推动能源转型,需要协调控制风力涡轮机等可再生能源以及燃料发电机和电池,以满足需求,同时在外生性和间歇性的负荷与风力条件下最小化燃料消耗和电池损耗。这些系统通常必须遵守广泛的法规和复杂的操作约束。为确保RL智能体遵循这些约束,提供可解释的保证至关重要。本文提出屏蔽控制器单元(SCUs),这是一种系统化且可解释的方法,利用系统动力学的先验知识来确保约束满足。我们设计的屏蔽合成方法面向实际部署,将环境分解为层次结构,其中每个SCU明确管理一部分约束。我们在具有严格操作要求的远程微电网优化任务中验证了SCUs的有效性。配备SCUs的RL智能体实现了燃料消耗降低24%,且未增加电池损耗,在满足所有约束的同时优于其他基线方法。我们希望SCUs能为RL安全应用于能源转型相关的众多决策挑战做出贡献。