基于自博弈强化学习与测试时搜索的《Stratego》超人类人工智能 (Superhuman AI for Stratego Using Self-Play Reinforcement Learning and Test-Time Search) - 专知论文

会员服务 ·

0

博弈 · 博弈强化学习 · 搜索 · 强化学习 · 基准 ·

Superhuman AI for Stratego Using Self-Play Reinforcement Learning and Test-Time Search

翻译：基于自博弈强化学习与测试时搜索的《Stratego》超人类人工智能

Samuel Sokota,Eugene Vinitsky,Hengyuan Hu,J. Zico Kolter,Gabriele Farina

Few classical games have been regarded as such significant benchmarks of artificial intelligence as to have justified training costs in the millions of dollars. Among these, Stratego -- a board wargame exemplifying the challenge of strategic decision making under massive amounts of hidden information -- stands apart as a case where such efforts failed to produce performance at the level of top humans. This work establishes a step change in both performance and cost for Stratego, showing that it is now possible not only to reach the level of top humans, but to achieve vastly superhuman level -- and that doing so requires not an industrial budget, but merely a few thousand dollars. We achieved this result by developing general approaches for self-play reinforcement learning and test-time search under imperfect information.

翻译：鲜有经典游戏被视为人工智能的重要基准，以至于需要投入数百万美元的训练成本。其中，《Stratego》——一款体现海量隐藏信息下战略决策挑战的棋盘战争游戏——尤为特殊，因为此类投入未能达到顶尖人类玩家的水平。本研究在《Stratego》的性能与成本方面实现了突破性进展，表明不仅可能达到顶尖人类水平，更能实现远超人类的性能——且达成这一目标无需工业级预算，仅需数千美元。我们通过开发不完美信息下的通用自博弈强化学习与测试时搜索方法取得了这一成果。

0

相关内容

DARPA D3M计划《发现和收集数据以支持数据分析》

DARPA D3M计划《发现和收集数据以支持数据分析》

专知会员服务

35+阅读 · 2024年5月18日

《在开放世界的新奇场景中测试人工智能学习（TALONS）》美国国防部高级研究计划局（DARPA）2023最新 312页报告

《在开放世界的新奇场景中测试人工智能学习（TALONS）》美国国防部高级研究计划局（DARPA）2023最新 312页报告

专知会员服务

72+阅读 · 2023年11月28日

最新最全《深度元学习》2021综述论文，68页pdf，A Survey of Deep Meta-Learning

最新最全《深度元学习》2021综述论文，68页pdf，A Survey of Deep Meta-Learning

专知会员服务

108+阅读 · 2020年10月9日

借助几何先验知识促进深度神经网络：综述 | Boosting Deep Neural Networks with Geometrical Prior Knowledge: A Survey

借助几何先验知识促进深度神经网络：综述 | Boosting Deep Neural Networks with Geometrical Prior Knowledge: A Survey

专知会员服务

29+阅读 · 2020年7月10日

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

专知会员服务

28+阅读 · 2019年11月8日

AAAI 2022 | ProtGNN：自解释图神经网络

AAAI 2022 | ProtGNN：自解释图神经网络

专知

10+阅读 · 2022年2月28日

最新最全《深度元学习》2021综述论文，68页pdf，A Survey of Deep Meta-Learning

最新最全《深度元学习》2021综述论文，68页pdf，A Survey of Deep Meta-Learning

专知

11+阅读 · 2021年4月23日

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图与推荐

10+阅读 · 2020年3月28日

读论文Discriminative Deep Metric Learning for Face and KV

读论文Discriminative Deep Metric Learning for Face and KV

统计学习与视觉计算组

12+阅读 · 2018年4月6日

大数据分析研究组开源Easy Machine Learning系统

大数据分析研究组开源Easy Machine Learning系统

中国科学院网络数据重点实验室

17+阅读 · 2017年6月13日

城市“建成环境——空间行为”的多尺度影响关系与机理研究

国家自然科学基金

13+阅读 · 2017年12月31日

孤独症的iPSC模型研究

国家自然科学基金

1+阅读 · 2015年12月31日

2D/3D视觉信息融合仿生SLAM关键问题研究

国家自然科学基金

3+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

海量Web用户生成内容物化关键技术

国家自然科学基金

2+阅读 · 2014年12月31日

Enhancing Analogy-Based Software Effort Estimation with Firefly Algorithm Optimization

Arxiv

0+阅读 · 11月29日

Optimizing PyTorch Inference with LLM-Based Multi-Agent Systems

Arxiv

0+阅读 · 11月21日

D-GARA: A Dynamic Benchmarking Framework for GUI Agent Robustness in Real-World Anomalies

Arxiv

0+阅读 · 11月20日

Towards Deploying VLA without Fine-Tuning: Plug-and-Play Inference-Time VLA Policy Steering via Embodied Evolutionary Diffusion

Arxiv

0+阅读 · 11月18日

Scaling Spatial Intelligence with Multimodal Foundation Models

Arxiv

0+阅读 · 11月17日

VIP会员

文章信息

相关主题

博弈强化学习

相关VIP内容

DARPA D3M计划《发现和收集数据以支持数据分析》

DARPA D3M计划《发现和收集数据以支持数据分析》

专知会员服务

35+阅读 · 2024年5月18日

《在开放世界的新奇场景中测试人工智能学习（TALONS）》美国国防部高级研究计划局（DARPA）2023最新 312页报告

《在开放世界的新奇场景中测试人工智能学习（TALONS）》美国国防部高级研究计划局（DARPA）2023最新 312页报告

专知会员服务

72+阅读 · 2023年11月28日

最新最全《深度元学习》2021综述论文，68页pdf，A Survey of Deep Meta-Learning

最新最全《深度元学习》2021综述论文，68页pdf，A Survey of Deep Meta-Learning

专知会员服务

108+阅读 · 2020年10月9日

借助几何先验知识促进深度神经网络：综述 | Boosting Deep Neural Networks with Geometrical Prior Knowledge: A Survey

借助几何先验知识促进深度神经网络：综述 | Boosting Deep Neural Networks with Geometrical Prior Knowledge: A Survey

专知会员服务

29+阅读 · 2020年7月10日

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

专知会员服务

28+阅读 · 2019年11月8日

热门VIP内容

开通专知VIP会员享更多权益服务

【MIT博士论文】弱监督学习：理论、方法与应用

Andrej Karpathy：2025 年 LLM 年度回顾（2025 LLM Year in Review）

锚定情报：合成欺骗时代的地面真相

NeurIPS 2025 | NMKE：基于神经元归因与动态稀疏掩码的终身知识编辑

相关资讯

AAAI 2022 | ProtGNN：自解释图神经网络

AAAI 2022 | ProtGNN：自解释图神经网络

专知

10+阅读 · 2022年2月28日

最新最全《深度元学习》2021综述论文，68页pdf，A Survey of Deep Meta-Learning

最新最全《深度元学习》2021综述论文，68页pdf，A Survey of Deep Meta-Learning

专知

11+阅读 · 2021年4月23日

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图与推荐

10+阅读 · 2020年3月28日

读论文Discriminative Deep Metric Learning for Face and KV

读论文Discriminative Deep Metric Learning for Face and KV

统计学习与视觉计算组

12+阅读 · 2018年4月6日

大数据分析研究组开源Easy Machine Learning系统

大数据分析研究组开源Easy Machine Learning系统

中国科学院网络数据重点实验室

17+阅读 · 2017年6月13日

相关论文

Enhancing Analogy-Based Software Effort Estimation with Firefly Algorithm Optimization

Arxiv

0+阅读 · 11月29日

Optimizing PyTorch Inference with LLM-Based Multi-Agent Systems

Arxiv

0+阅读 · 11月21日

D-GARA: A Dynamic Benchmarking Framework for GUI Agent Robustness in Real-World Anomalies

Arxiv

0+阅读 · 11月20日

Towards Deploying VLA without Fine-Tuning: Plug-and-Play Inference-Time VLA Policy Steering via Embodied Evolutionary Diffusion

Arxiv

0+阅读 · 11月18日

Scaling Spatial Intelligence with Multimodal Foundation Models

Arxiv

0+阅读 · 11月17日

相关基金

城市“建成环境——空间行为”的多尺度影响关系与机理研究

国家自然科学基金

13+阅读 · 2017年12月31日

孤独症的iPSC模型研究

国家自然科学基金

1+阅读 · 2015年12月31日

2D/3D视觉信息融合仿生SLAM关键问题研究

国家自然科学基金

3+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

海量Web用户生成内容物化关键技术

国家自然科学基金

2+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员