了解为寻求情报而探索资料的来源,了解控制概率目标 (Understanding the origin of information-seeking exploration in probabilistic objectives for control) - 专知论文

会员服务 ·

0

散度 · 可理解性 · 极大 · 原点 · INFORMS ·

2021 年 3 月 16 日

Understanding the origin of information-seeking exploration in probabilistic objectives for control

翻译：了解为寻求情报而探索资料的来源,了解控制概率目标

Beren Millidge,Alexander Tschantz,Anil Seth,Christopher Buckley

from arxiv, 11-03-21 initial upload. 14-03-21 fix Charnov citation. 16-03-21 another fix

The exploration-exploitation trade-off is central to the description of adaptive behaviour in fields ranging from machine learning, to biology, to economics. While many approaches have been taken, one approach to solving this trade-off has been to equip or propose that agents possess an intrinsic 'exploratory drive' which is often implemented in terms of maximizing the agents information gain about the world -- an approach which has been widely studied in machine learning and cognitive science. In this paper we mathematically investigate the nature and meaning of such approaches and demonstrate that this combination of utility maximizing and information-seeking behaviour arises from the minimization of an entirely difference class of objectives we call divergence objectives. We propose a dichotomy in the objective functions underlying adaptive behaviour between \emph{evidence} objectives, which correspond to well-known reward or utility maximizing objectives in the literature, and \emph{divergence} objectives which instead seek to minimize the divergence between the agent's expected and desired futures, and argue that this new class of divergence objectives could form the mathematical foundation for a much richer understanding of the exploratory components of adaptive and intelligent action, beyond simply greedy utility maximization.

翻译：勘探-开发权衡是描述从机器学习、生物学到经济学等领域适应行为的核心。虽然采取了许多办法,但解决这一权衡的一个办法是,装备或提议代理人拥有内在的“探索动力”,这往往是为了最大限度地增加有关世界的代理信息收益,这是在机器学习和认知科学中广泛研究的一种方法。在本文中,我们用数学来研究这种方法的性质和含义,并表明这种效用最大化和寻求信息行为的结合产生于尽可能缩小一个完全不同的目标类别,我们称之为差异目标。我们提议将适应行为背后的客观功能分为以下两种:一种是文献中众所周知的奖励或效用最大化目标,另一种是力求最大限度地缩小代理人预期和期望的未来之间的差距的目标。我们主张,这种新的差异目标类别可以构成数学基础,以便更深入地了解适应和智能行动的探索性组成部分,而不仅仅是贪婪效用最大化。

0

相关内容

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【AAAI 2020】InteractE: 通过增加特征交互来改进基于卷积的知识图谱嵌入， InteractE: Improving Convolution-based Knowledge Graph Embeddings by Increasing Feature Interactions

【AAAI 2020】InteractE: 通过增加特征交互来改进基于卷积的知识图谱嵌入， InteractE: Improving Convolution-based Knowledge Graph Embeddings by Increasing Feature Interactions

专知会员服务

53+阅读 · 2020年6月7日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

Yoshua Bengio，使算法知道“为什么”

Yoshua Bengio，使算法知道“为什么”

专知会员服务

8+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

【今日新增】IEEE Trans.专刊截稿信息8条

【今日新增】IEEE Trans.专刊截稿信息8条

Call4Papers

7+阅读 · 2017年6月29日

The Causal Learning of Retail Delinquency

Arxiv

14+阅读 · 2020年12月17日

Efficient Fully-Offline Meta-Reinforcement Learning via Distance Metric Learning and Behavior Regularization

Arxiv

8+阅读 · 2020年11月26日

Products of Euclidean metrics and applications to proximity questions among curves

Arxiv

3+阅读 · 2020年4月13日

Probabilistic Logic Neural Networks for Reasoning

Arxiv

7+阅读 · 2019年6月20日

Generalization and Regularization in DQN

Generalization and Regularization in DQN

Arxiv

6+阅读 · 2019年1月30日

A Probe into Understanding GAN and VAE models

A Probe into Understanding GAN and VAE models

Arxiv

9+阅读 · 2018年12月13日

A Tour of Reinforcement Learning: The View from Continuous Control

Arxiv

6+阅读 · 2018年6月25日

Understanding disentangling in $β$-VAE

Arxiv

4+阅读 · 2018年4月10日

Learning to Adapt: Meta-Learning for Model-Based Control

Arxiv

9+阅读 · 2018年3月30日

Parameter Space Noise for Exploration

Arxiv

3+阅读 · 2018年1月31日

VIP会员

文章信息

相关主题

相关VIP内容

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【AAAI 2020】InteractE: 通过增加特征交互来改进基于卷积的知识图谱嵌入， InteractE: Improving Convolution-based Knowledge Graph Embeddings by Increasing Feature Interactions

【AAAI 2020】InteractE: 通过增加特征交互来改进基于卷积的知识图谱嵌入， InteractE: Improving Convolution-based Knowledge Graph Embeddings by Increasing Feature Interactions

专知会员服务

53+阅读 · 2020年6月7日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

Yoshua Bengio，使算法知道“为什么”

Yoshua Bengio，使算法知道“为什么”

专知会员服务

8+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《使用量化测量将传感器节点关联到融合中心的算法设计》171页

军事前沿模型

提升军事训练能力的最佳人工智能模拟工具

《社交媒体信息作战》最新48页技术报告

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

【今日新增】IEEE Trans.专刊截稿信息8条

【今日新增】IEEE Trans.专刊截稿信息8条

Call4Papers

7+阅读 · 2017年6月29日

相关论文

The Causal Learning of Retail Delinquency

Arxiv

14+阅读 · 2020年12月17日

Efficient Fully-Offline Meta-Reinforcement Learning via Distance Metric Learning and Behavior Regularization

Arxiv

8+阅读 · 2020年11月26日

Products of Euclidean metrics and applications to proximity questions among curves

Arxiv

3+阅读 · 2020年4月13日

Probabilistic Logic Neural Networks for Reasoning

Arxiv

7+阅读 · 2019年6月20日

Generalization and Regularization in DQN

Generalization and Regularization in DQN

Arxiv

6+阅读 · 2019年1月30日

A Probe into Understanding GAN and VAE models

A Probe into Understanding GAN and VAE models

Arxiv

9+阅读 · 2018年12月13日

A Tour of Reinforcement Learning: The View from Continuous Control

Arxiv

6+阅读 · 2018年6月25日

Understanding disentangling in $β$-VAE

Arxiv

4+阅读 · 2018年4月10日

Learning to Adapt: Meta-Learning for Model-Based Control

Arxiv

9+阅读 · 2018年3月30日

Parameter Space Noise for Exploration

Arxiv

3+阅读 · 2018年1月31日

微信扫码咨询专知VIP会员