多智能体强化学习在仓储机器人中的应用 (MARL Warehouse Robots) - 专知论文

会员服务 ·

0

多智能体强化学习 · 机器人 · 多智能体 · 智能体 · 强化学习 ·

MARL Warehouse Robots

翻译：多智能体强化学习在仓储机器人中的应用

Price Allman,Lian Thang,Dre Simmons,Salmon Riaz

from arxiv, 5 pages.Project documentation: https://pallman14.github.io/MARL-QMIX-Warehouse-Robots/

We present a comparative study of multi-agent reinforcement learning (MARL) algorithms for cooperative warehouse robotics. We evaluate QMIX and IPPO on the Robotic Warehouse (RWARE) environment and a custom Unity 3D simulation. Our experiments reveal that QMIX's value decomposition significantly outperforms independent learning approaches (achieving 3.25 mean return vs. 0.38 for advanced IPPO), but requires extensive hyperparameter tuning -- particularly extended epsilon annealing (5M+ steps) for sparse reward discovery. We demonstrate successful deployment in Unity ML-Agents, achieving consistent package delivery after 1M training steps. While MARL shows promise for small-scale deployments (2-4 robots), significant scaling challenges remain. Code and analyses: https://pallman14.github.io/MARL-QMIX-Warehouse-Robots/

翻译：本文对用于协作式仓储机器人的多智能体强化学习算法进行了比较研究。我们在Robotic Warehouse环境及自定义Unity 3D仿真平台上评估了QMIX与IPPO算法。实验表明：QMIX的价值分解机制显著优于独立学习方法（平均回报达3.25，而改进版IPPO仅为0.38），但需要大量超参数调优——特别是针对稀疏奖励发现需进行扩展的ε退火处理（超过500万步）。我们成功在Unity ML-Agents中实现了部署，经过100万步训练后实现了稳定的包裹配送。虽然多智能体强化学习在小规模部署（2-4台机器人）中展现出潜力，但仍面临显著的扩展性挑战。代码与分析见：https://pallman14.github.io/MARL-QMIX-Warehouse-Robots/

0

相关内容

多智能体强化学习

多智能体强化学习

【NeurIPS2023】基于反事实保守Q学习的离线多智能体强化学习

【NeurIPS2023】基于反事实保守Q学习的离线多智能体强化学习

专知会员服务

17+阅读 · 2023年9月25日

Nat Rev Mol Cell Bio｜用人工智能预测蛋白质结构的前景和机遇

Nat Rev Mol Cell Bio｜用人工智能预测蛋白质结构的前景和机遇

专知会员服务

19+阅读 · 2022年5月1日

【TPAMI】从人机对抗提出视觉跟踪智能评估新方法，Global Instance Tracking: Locating Target More Like Humans

【TPAMI】从人机对抗提出视觉跟踪智能评估新方法，Global Instance Tracking: Locating Target More Like Humans

专知会员服务

22+阅读 · 2022年3月29日

【CVPR 2022】基于视觉-语言验证和迭代推理的视觉定位,Open-Vocabulary One-Stage Detection with Hierarchical Visual-Language Knowledge Distillation

【CVPR 2022】基于视觉-语言验证和迭代推理的视觉定位,Open-Vocabulary One-Stage Detection with Hierarchical Visual-Language Knowledge Distillation

专知会员服务

12+阅读 · 2022年3月19日

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

专知会员服务

28+阅读 · 2019年11月8日

【ICML2020】多视角对比图表示学习，Contrastive Multi-View GRL

【ICML2020】多视角对比图表示学习，Contrastive Multi-View GRL

专知

37+阅读 · 2020年6月11日

论文浅尝 | GEOM-GCN: Geometric Graph Convolutional Networks

论文浅尝 | GEOM-GCN: Geometric Graph Convolutional Networks

开放知识图谱

14+阅读 · 2020年4月8日

【WWW2020-新加坡国立大学】知识图谱强化负采样的推荐系统，Reinforced Negative Sampling

【WWW2020-新加坡国立大学】知识图谱强化负采样的推荐系统，Reinforced Negative Sampling

专知

22+阅读 · 2020年3月14日

【NeurIPS2019】图变换网络：Graph Transformer Network

【NeurIPS2019】图变换网络：Graph Transformer Network

专知

245+阅读 · 2019年11月18日

学界 | 最大化互信息来学习深度表示，Bengio等提出Deep INFOMAX

学界 | 最大化互信息来学习深度表示，Bengio等提出Deep INFOMAX

机器之心

10+阅读 · 2018年9月6日

2D/3D视觉信息融合仿生SLAM关键问题研究

国家自然科学基金

3+阅读 · 2015年12月31日

半金属铋单质直接等离子体光催化作用与“记忆效应”机制探索

国家自然科学基金

0+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

基于Lotka-Volterra种群模型和广义效益的公共交通出行结构优化方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

Cas9技术在斑马鱼中鉴定MIP新突变位点功能

国家自然科学基金

0+阅读 · 2014年12月31日

Parameter Reduction Improves Vision Transformers: A Comparative Study of Sharing and Width Reduction

Arxiv

0+阅读 · 11月30日

Human Experts' Evaluation of Generative AI for Contextualizing STEAM Education in the Global South

Arxiv

0+阅读 · 11月26日

Increasing AI Explainability by LLM Driven Standard Processes

Arxiv

0+阅读 · 11月10日

Scalable Multi-Robot Motion Planning Using Workspace Guidance-Informed Hypergraphs

Arxiv

0+阅读 · 11月5日

AI Diffusion in Low Resource Language Countries

Arxiv

0+阅读 · 11月4日

VIP会员

文章信息

相关主题

多智能体强化学习

相关VIP内容

【NeurIPS2023】基于反事实保守Q学习的离线多智能体强化学习

【NeurIPS2023】基于反事实保守Q学习的离线多智能体强化学习

专知会员服务

17+阅读 · 2023年9月25日

Nat Rev Mol Cell Bio｜用人工智能预测蛋白质结构的前景和机遇

Nat Rev Mol Cell Bio｜用人工智能预测蛋白质结构的前景和机遇

专知会员服务

19+阅读 · 2022年5月1日

【TPAMI】从人机对抗提出视觉跟踪智能评估新方法，Global Instance Tracking: Locating Target More Like Humans

【TPAMI】从人机对抗提出视觉跟踪智能评估新方法，Global Instance Tracking: Locating Target More Like Humans

专知会员服务

22+阅读 · 2022年3月29日

【CVPR 2022】基于视觉-语言验证和迭代推理的视觉定位,Open-Vocabulary One-Stage Detection with Hierarchical Visual-Language Knowledge Distillation

【CVPR 2022】基于视觉-语言验证和迭代推理的视觉定位,Open-Vocabulary One-Stage Detection with Hierarchical Visual-Language Knowledge Distillation

专知会员服务

12+阅读 · 2022年3月19日

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

专知会员服务

28+阅读 · 2019年11月8日

热门VIP内容

开通专知VIP会员享更多权益服务

Andrej Karpathy：2025 年 LLM 年度回顾（2025 LLM Year in Review）

前沿人工智能趋势报告（Frontier AI Trends Report）

音退化问题：基于输入操控的鲁棒语音转换综述

相关资讯

【ICML2020】多视角对比图表示学习，Contrastive Multi-View GRL

【ICML2020】多视角对比图表示学习，Contrastive Multi-View GRL

专知

37+阅读 · 2020年6月11日

论文浅尝 | GEOM-GCN: Geometric Graph Convolutional Networks

论文浅尝 | GEOM-GCN: Geometric Graph Convolutional Networks

开放知识图谱

14+阅读 · 2020年4月8日

【WWW2020-新加坡国立大学】知识图谱强化负采样的推荐系统，Reinforced Negative Sampling

【WWW2020-新加坡国立大学】知识图谱强化负采样的推荐系统，Reinforced Negative Sampling

专知

22+阅读 · 2020年3月14日

【NeurIPS2019】图变换网络：Graph Transformer Network

【NeurIPS2019】图变换网络：Graph Transformer Network

专知

245+阅读 · 2019年11月18日

学界 | 最大化互信息来学习深度表示，Bengio等提出Deep INFOMAX

学界 | 最大化互信息来学习深度表示，Bengio等提出Deep INFOMAX

机器之心

10+阅读 · 2018年9月6日

相关论文

Parameter Reduction Improves Vision Transformers: A Comparative Study of Sharing and Width Reduction

Arxiv

0+阅读 · 11月30日

Human Experts' Evaluation of Generative AI for Contextualizing STEAM Education in the Global South

Arxiv

0+阅读 · 11月26日

Increasing AI Explainability by LLM Driven Standard Processes

Arxiv

0+阅读 · 11月10日

Scalable Multi-Robot Motion Planning Using Workspace Guidance-Informed Hypergraphs

Arxiv

0+阅读 · 11月5日

AI Diffusion in Low Resource Language Countries

Arxiv

0+阅读 · 11月4日

相关基金

2D/3D视觉信息融合仿生SLAM关键问题研究

国家自然科学基金

3+阅读 · 2015年12月31日

半金属铋单质直接等离子体光催化作用与“记忆效应”机制探索

国家自然科学基金

0+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

基于Lotka-Volterra种群模型和广义效益的公共交通出行结构优化方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

Cas9技术在斑马鱼中鉴定MIP新突变位点功能

国家自然科学基金

0+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员