关于在连续空间加强学习中应用诱捕装置 (On Applications of Bootstrap in Continuous Space Reinforcement Learning)

In decision making problems for continuous state and action spaces, linear dynamical models are widely employed. Specifically, policies for stochastic linear systems subject to quadratic cost functions capture a large number of applications in reinforcement learning. Selected randomized policies have been studied in the literature recently that address the trade-off between identification and control. However, little is known about policies based on bootstrapping observed states and actions. In this work, we show that bootstrap-based policies achieve a square root scaling of regret with respect to time. We also obtain results on the accuracy of learning the model's dynamics. Corroborative numerical analysis that illustrates the technical results is also provided.

翻译：在持续状态和行动空间的决策问题方面,广泛采用线性动态模型,具体来说,具有二次成本功能的随机线性系统政策在强化学习中有大量应用。最近,在论述识别和控制权衡的文献中研究了选定的随机化政策。然而,对基于所观察到的靴子状态和行动的政策知之甚少。在这项工作中,我们表明以靴子为主的政策在时间方面有平方根的遗憾。我们还获得了关于学习模型动态的准确性的结果。还提供了说明技术结果的逻辑数字分析。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

40+阅读 · 2020年4月11日

深度强化学习方法及其在经济学中的应用综述，Comprehensive Review of Deep Reinforcement Learning Methods and Applicationsin Economic

专知会员服务

50+阅读 · 2020年4月7日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

81+阅读 · 2020年2月18日