We present a comparative study of multi-agent reinforcement learning (MARL) algorithms for cooperative warehouse robotics. We evaluate QMIX and IPPO on the Robotic Warehouse (RWARE) environment and a custom Unity 3D simulation. Our experiments reveal that QMIX's value decomposition significantly outperforms independent learning approaches (achieving 3.25 mean return vs. 0.38 for advanced IPPO), but requires extensive hyperparameter tuning -- particularly extended epsilon annealing (5M+ steps) for sparse reward discovery. We demonstrate successful deployment in Unity ML-Agents, achieving consistent package delivery after 1M training steps. While MARL shows promise for small-scale deployments (2-4 robots), significant scaling challenges remain. Code and analyses: https://pallman14.github.io/MARL-QMIX-Warehouse-Robots/
翻译:本文对用于协作式仓库机器人的多智能体强化学习(MARL)算法进行了比较研究。我们在Robotic Warehouse(RWARE)环境及自定义的Unity 3D仿真平台上评估了QMIX与IPPO算法。实验结果表明:QMIX的价值分解方法显著优于独立学习方案(平均回报达3.25,而先进IPPO仅为0.38),但需要大量超参数调优——特别是为稀疏奖励探索需进行超长ε退火(500万步以上)。我们成功在Unity ML-Agents中实现了部署,经过100万训练步后达到稳定的包裹配送效率。虽然MARL在小规模部署(2-4台机器人)中展现出潜力,但仍面临显著的扩展性挑战。代码与分析见:https://pallman14.github.io/MARL-QMIX-Warehouse-Robots/