Reinforcement learning (RL) excels in various applications but struggles in dynamic environments where the underlying Markov decision process evolves. Continual reinforcement learning (CRL) enables RL agents to continually learn and adapt to new tasks, but balancing stability (preserving prior knowledge) and plasticity (acquiring new knowledge) remains challenging. Existing methods primarily address the stability-plasticity dilemma through mechanisms where past knowledge influences optimization but rarely affects the agent's behavior directly, which may hinder effective knowledge reuse and efficient learning. In contrast, we propose demonstration-guided continual reinforcement learning (DGCRL), which stores prior knowledge in an external, self-evolving demonstration repository that directly guides RL exploration and adaptation. For each task, the agent dynamically selects the most relevant demonstration and follows a curriculum-based strategy to accelerate learning, gradually shifting from demonstration-guided exploration to fully self-exploration. Extensive experiments on 2D navigation and MuJoCo locomotion tasks demonstrate its superior average performance, enhanced knowledge transfer, mitigation of forgetting, and training efficiency. The additional sensitivity analysis and ablation study further validate its effectiveness.
翻译:强化学习(RL)在各种应用中表现出色,但在底层马尔可夫决策过程演变的动态环境中面临挑战。持续强化学习(CRL)使RL智能体能够持续学习并适应新任务,但平衡稳定性(保留先验知识)与可塑性(获取新知识)仍然困难。现有方法主要通过先验知识影响优化的机制来解决稳定性-可塑性困境,但很少直接影响智能体的行为,这可能阻碍有效的知识重用和高效学习。相比之下,我们提出了演示引导的持续强化学习(DGCRL),该方法将先验知识存储在一个外部的、自我演化的演示存储库中,直接指导RL的探索与适应。对于每个任务,智能体动态选择最相关的演示,并遵循基于课程学习的策略来加速学习,逐步从演示引导的探索过渡到完全自主探索。在2D导航和MuJoCo运动任务上的大量实验表明,该方法在平均性能、知识迁移能力、遗忘缓解和训练效率方面均表现优异。额外的敏感性分析和消融研究进一步验证了其有效性。