可直接利用性:在大型游戏中学习最佳应对办法 (Approximate exploitability: Learning a best response in large games) - 专知论文

会员服务 ·

0

近似 · 学成 · Performance · 知识 (knowledge) · 稳健性 ·

2022 年 5 月 5 日

Approximate exploitability: Learning a best response in large games

翻译：可直接利用性:在大型游戏中学习最佳应对办法

Finbarr Timbers,Nolan Bard,Edward Lockhart,Marc Lanctot,Martin Schmid,Neil Burch,Julian Schrittwieser,Thomas Hubert,Michael Bowling

Researchers have demonstrated that neural networks are vulnerable to adversarial examples and subtle environment changes, both of which one can view as a form of distribution shift. To humans, the resulting errors can look like blunders, eroding trust in these agents. In prior games research, agent evaluation often focused on the in-practice game outcomes. While valuable, such evaluation typically fails to evaluate robustness to worst-case outcomes. Prior research in computer poker has examined how to assess such worst-case performance, both exactly and approximately. Unfortunately, exact computation is infeasible with larger domains, and existing approximations rely on poker-specific knowledge. We introduce ISMCTS-BR, a scalable search-based deep reinforcement learning algorithm for learning a best response to an agent, thereby approximating worst-case performance. We demonstrate the technique in several two-player zero-sum games against a variety of agents, including several AlphaZero-based agents.

翻译：研究人员已经表明,神经网络很容易受到对抗性实例和环境微妙变化的影响,两者都可被视为一种分配转移形式。对人类来说,由此产生的错误可能看起来像错误,削弱对这些代理人的信任。在以前的游戏研究中,代理评价往往侧重于实际操作游戏的结果。虽然这种评价很宝贵,但通常无法评价最坏情况结果的稳健性。计算机扑克先前的研究已经研究过如何准确和大致地评估这种最坏情况的性能。不幸的是,精确的计算在更大的领域是行不通的,而现有的近似则依赖于对扑克的特定知识。我们引入了基于可扩缩的搜索深度强化学习算法ISMCTS-BR,以学习对代理人的最佳反应,从而接近最坏情况的表现。我们展示了针对多种代理人,包括若干以阿尔法泽罗为基础的代理人的两次玩牌零和游戏的技术。

0

相关内容

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Hamilton-Jacibi方程的弱KAM理论

国家自然科学基金

2+阅读 · 2017年12月31日

模糊情况下的最优消费与投资

国家自然科学基金

3+阅读 · 2015年12月31日

肝细胞癌Gd-EOB-DTPA增强MR成像肝胆期信号的分子病理机制及对恶性程度与预后评估的研究

国家自然科学基金

0+阅读 · 2014年12月31日

Delta-Sarcoglycan基因的两个新突变在东亚人遗传性心肌病中的致病作用及其机理

国家自然科学基金

0+阅读 · 2014年12月31日

有理函数非旋转Fatou域与不连通Julia集的结构

国家自然科学基金

0+阅读 · 2014年12月31日

TGF-beta信号通路在小细胞肺癌转移中的分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

采用pinball loss的MEE算法研究

国家自然科学基金

1+阅读 · 2013年12月31日

赋值理论与几何不等式的研究

国家自然科学基金

1+阅读 · 2011年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

磁性Pickering乳液界面流变学研究

国家自然科学基金

0+阅读 · 2008年12月31日

HANF: Hyperparameter And Neural Architecture Search in Federated Learning

Arxiv

1+阅读 · 2022年6月24日

Inference on the Best Policies with Many Covariates

Inference on the Best Policies with Many Covariates

Arxiv

0+阅读 · 2022年6月23日

A Diagnostic Approach to Assess the Quality of Data Splitting in Machine Learning

A Diagnostic Approach to Assess the Quality of Data Splitting in Machine Learning

Arxiv

0+阅读 · 2022年6月23日

Auto-Encoding Adversarial Imitation Learning

Arxiv

0+阅读 · 2022年6月22日

Learning to Search in Task and Motion Planning with Streams

Arxiv

0+阅读 · 2022年6月21日

Neural Architecture Search without Training

Neural Architecture Search without Training

Arxiv

10+阅读 · 2021年6月11日

Multi-Agent Cooperative Bidding Games for Multi-Objective Optimization in e-Commercial Sponsored Search

Arxiv

12+阅读 · 2021年6月8日

Adversarial Machine Learning in Image Classification: A Survey Towards the Defender's Perspective

Adversarial Machine Learning in Image Classification: A Survey Towards the Defender's Perspective

Arxiv

17+阅读 · 2020年9月8日

A Survey of Adversarial Learning on Graphs

Arxiv

38+阅读 · 2020年3月10日

Adversarial Transfer Learning

Adversarial Transfer Learning

Arxiv

12+阅读 · 2018年12月6日

VIP会员

文章信息

相关主题

知识 (knowledge)

相关VIP内容

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

新型数字杀伤链：理解综合战术网络对野战炮兵体系的能力与效益

《对抗环境中运用数字孪生技术优化预测性维护与后勤保障》2025最新93页

《任务式指挥十六个案例研究》232页

《幻觉还是事实：国防大型语言模型的可信度评估研究》2025最新109页

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

相关论文

HANF: Hyperparameter And Neural Architecture Search in Federated Learning

Arxiv

1+阅读 · 2022年6月24日

Inference on the Best Policies with Many Covariates

Inference on the Best Policies with Many Covariates

Arxiv

0+阅读 · 2022年6月23日

A Diagnostic Approach to Assess the Quality of Data Splitting in Machine Learning

A Diagnostic Approach to Assess the Quality of Data Splitting in Machine Learning

Arxiv

0+阅读 · 2022年6月23日

Auto-Encoding Adversarial Imitation Learning

Arxiv

0+阅读 · 2022年6月22日

Learning to Search in Task and Motion Planning with Streams

Arxiv

0+阅读 · 2022年6月21日

Neural Architecture Search without Training

Neural Architecture Search without Training

Arxiv

10+阅读 · 2021年6月11日

Multi-Agent Cooperative Bidding Games for Multi-Objective Optimization in e-Commercial Sponsored Search

Arxiv

12+阅读 · 2021年6月8日

Adversarial Machine Learning in Image Classification: A Survey Towards the Defender's Perspective

Adversarial Machine Learning in Image Classification: A Survey Towards the Defender's Perspective

Arxiv

17+阅读 · 2020年9月8日

A Survey of Adversarial Learning on Graphs

Arxiv

38+阅读 · 2020年3月10日

Adversarial Transfer Learning

Adversarial Transfer Learning

Arxiv

12+阅读 · 2018年12月6日

相关基金

Hamilton-Jacibi方程的弱KAM理论

国家自然科学基金

2+阅读 · 2017年12月31日

模糊情况下的最优消费与投资

国家自然科学基金

3+阅读 · 2015年12月31日

肝细胞癌Gd-EOB-DTPA增强MR成像肝胆期信号的分子病理机制及对恶性程度与预后评估的研究

国家自然科学基金

0+阅读 · 2014年12月31日

Delta-Sarcoglycan基因的两个新突变在东亚人遗传性心肌病中的致病作用及其机理

国家自然科学基金

0+阅读 · 2014年12月31日

有理函数非旋转Fatou域与不连通Julia集的结构

国家自然科学基金

0+阅读 · 2014年12月31日

TGF-beta信号通路在小细胞肺癌转移中的分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

采用pinball loss的MEE算法研究

国家自然科学基金

1+阅读 · 2013年12月31日

赋值理论与几何不等式的研究

国家自然科学基金

1+阅读 · 2011年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

磁性Pickering乳液界面流变学研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员