Reinforcement learning (RL) has shown promise in solving various combinatorial optimization problems. However, conventional RL faces challenges when dealing with complex, real-world constraints, especially when action space feasibility is explicit and dependent on the corresponding state or trajectory. In this work, we address stochastic sequential dynamic decision-making problems with state-dependent constraints. As a relevant and real-world case study, we focus on the master stowage planning problem in container shipping, which aims to optimize revenue and operational costs under demand uncertainty and operational constraints. We propose a deep RL framework with an encoder-decoder model and feasibility layers that satisfy convex constraints and maintain unbiased gradient flow, which embed problem instances, current solutions, and demand uncertainty to guide learning. Experiments show that our model efficiently finds adaptive, feasible solutions that generalize across varying distributions and scale to larger instances, outperforming state-of-the-art baselines in constrained RL and stochastic programming. By uniting artificial intelligence and operations research, our policy empowers humans to make adaptive, uncertainty-aware decisions for resilient and sustainable planning.
翻译:暂无翻译