spinningup.openai 强化学习资源完整

2018 年 12 月 17 日 CreateAMind
spinningup.openai 强化学习资源完整

Welcome to Spinning Up in Deep RL!

User Documentation

  • Introduction

    • What This Is

    • Why We Built This

    • How This Serves Our Mission

    • Code Design Philosophy

    • Support Plan

  • Installation

    • Installing Python

    • Installing OpenMPI

    • Installing Spinning Up

    • Check Your Install

    • Installing MuJoCo (Optional)

  • Algorithms

    • What’s Included

    • Why These Algorithms?

    • Code Format

  • Running Experiments

    • Launching from the Command Line

    • Launching from Scripts

  • Experiment Outputs

    • Algorithm Outputs

    • Save Directory Location

    • Loading and Running Trained Policies

  • Plotting Results

Introduction to RL

  • Part 1: Key Concepts in RL

    • What Can RL Do?

    • Key Concepts and Terminology

    • (Optional) Formalism

  • Part 2: Kinds of RL Algorithms

    • A Taxonomy of RL Algorithms

    • Links to Algorithms in Taxonomy

  • Part 3: Intro to Policy Optimization

    • Deriving the Simplest Policy Gradient

    • Implementing the Simplest Policy Gradient

    • Expected Grad-Log-Prob Lemma

    • Don’t Let the Past Distract You

    • Implementing Reward-to-Go Policy Gradient

    • Baselines in Policy Gradients

    • Other Forms of the Policy Gradient

    • Recap


  • Spinning Up as a Deep RL Researcher

    • The Right Background

    • Learn by Doing

    • Developing a Research Project

    • Doing Rigorous Research in RL

    • Closing Thoughts

    • PS: Other Resources

    • References

  • Key Papers in Deep RL

    • 1. Model-Free RL

    • 2. Exploration

    • 3. Transfer and Multitask RL

    • 4. Hierarchy

    • 5. Memory

    • 6. Model-Based RL

    • 7. Meta-RL

    • 8. Scaling RL

    • 9. RL in the Real World

    • 10. Safety

    • 11. Imitation Learning and Inverse Reinforcement Learning

    • 12. Reproducibility, Analysis, and Critique

    • 13. Bonus: Classic Papers in RL Theory or Review

  • Exercises

    • Problem Set 1: Basics of Implementation

    • Problem Set 2: Algorithm Failure Modes

    • Challenges

  • Benchmarks for Spinning Up Implementations

    • Performance in Each Environment

    • Experiment Details

Algorithms Docs

  • Vanilla Policy Gradient

    • Background

    • Documentation

    • References

  • Trust Region Policy Optimization

    • Background

    • Documentation

    • References

  • Proximal Policy Optimization

    • Background

    • Documentation

    • References

  • Deep Deterministic Policy Gradient

    • Background

    • Documentation

    • References

  • Twin Delayed DDPG

    • Background

    • Documentation

    • References

  • Soft Actor-Critic

    • Background

    • Documentation

    • References

Utilities Docs

  • Logger

    • Using a Logger

    • Logger Classes

    • Loading Saved Graphs

  • Plotter

  • MPI Tools

    • Core MPI Utilities

    • MPI + Tensorflow Utilities

  • Run Utils

    • ExperimentGrid

    • Calling Experiments


  • Acknowledgements

  • About the Author

Indices and tables

  • Index

  • Module Index

  • Search Page



OpenAI,由诸多硅谷大亨联合建立的人工智能非营利组织。2015年马斯克与其他硅谷科技大亨进行连续对话后,决定共同创建OpenAI,希望能够预防人工智能的灾难性影响,推动人工智能发挥积极作用。特斯拉电动汽车公司与美国太空技术探索公司SpaceX创始人马斯克、Y Combinator总裁阿尔特曼、天使投资人彼得·泰尔(Peter Thiel)以及其他硅谷巨头去年12月份承诺向OpenAI注资10亿美元。

The tutorial is written for those who would like an introduction to reinforcement learning (RL). The aim is to provide an intuitive presentation of the ideas rather than concentrate on the deeper mathematics underlying the topic. RL is generally used to solve the so-called Markov decision problem (MDP). In other words, the problem that you are attempting to solve with RL should be an MDP or its variant. The theory of RL relies on dynamic programming (DP) and artificial intelligence (AI). We will begin with a quick description of MDPs. We will discuss what we mean by “complex” and “large-scale” MDPs. Then we will explain why RL is needed to solve complex and large-scale MDPs. The semi-Markov decision problem (SMDP) will also be covered.

The tutorial is meant to serve as an introduction to these topics and is based mostly on the book: “Simulation-based optimization: Parametric Optimization techniques and reinforcement learning” [4]. The book discusses this topic in greater detail in the context of simulators. There are at least two other textbooks that I would recommend you to read: (i) Neuro-dynamic programming [2] (lots of details on convergence analysis) and (ii) Reinforcement Learning: An Introduction [11] (lots of details on underlying AI concepts). A more recent tutorial on this topic is [8]. This tutorial has 2 sections: • Section 2 discusses MDPs and SMDPs. • Section 3 discusses RL. By the end of this tutorial, you should be able to • Identify problem structures that can be set up as MDPs / SMDPs. • Use some RL algorithms.


Most policy search algorithms require thousands of training episodes to find an effective policy, which is often infeasible with a physical robot. This survey article focuses on the extreme other end of the spectrum: how can a robot adapt with only a handful of trials (a dozen) and a few minutes? By analogy with the word "big-data", we refer to this challenge as "micro-data reinforcement learning". We show that a first strategy is to leverage prior knowledge on the policy structure (e.g., dynamic movement primitives), on the policy parameters (e.g., demonstrations), or on the dynamics (e.g., simulators). A second strategy is to create data-driven surrogate models of the expected reward (e.g., Bayesian optimization) or the dynamical model (e.g., model-based policy search), so that the policy optimizer queries the model instead of the real system. Overall, all successful micro-data algorithms combine these two strategies by varying the kind of model and prior knowledge. The current scientific challenges essentially revolve around scaling up to complex robots (e.g., humanoids), designing generic priors, and optimizing the computing time.

5+阅读 · 2019年1月18日
强化学习的Unsupervised Meta-Learning
7+阅读 · 2019年1月7日
RL 真经
4+阅读 · 2018年12月28日
10+阅读 · 2018年12月12日
14+阅读 · 2018年11月10日
9+阅读 · 2018年11月10日
资源 | 《深度强化学习》手稿开放了!
16+阅读 · 2018年10月24日
6+阅读 · 2017年11月24日
10+阅读 · 2017年8月2日
强化学习 cartpole_a3c
9+阅读 · 2017年7月21日
50+阅读 · 2020年3月2日
79+阅读 · 2020年2月1日
【强化学习资源集合】Awesome Reinforcement Learning
41+阅读 · 2019年12月23日
Keras François Chollet 《Deep Learning with Python 》, 386页pdf
48+阅读 · 2019年10月12日
55+阅读 · 2019年10月11日
16+阅读 · 2019年10月9日
114+阅读 · 2019年10月9日
Q-value Path Decomposition for Deep Multiagent Reinforcement Learning
Yaodong Yang,Jianye Hao,Guangyong Chen,Hongyao Tang,Yingfeng Chen,Yujing Hu,Changjie Fan,Zhongyu Wei
15+阅读 · 2020年2月10日
Accelerated Methods for Deep Reinforcement Learning
Adam Stooke,Pieter Abbeel
4+阅读 · 2019年1月10日
PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation
Perttu Hämäläinen,Amin Babadi,Xiaoxiao Ma,Jaakko Lehtinen
3+阅读 · 2018年12月18日
Brendan O'Donoghue
3+阅读 · 2018年7月25日
Contrastive Explanations for Reinforcement Learning in terms of Expected Consequences
Jasper van der Waa,Jurriaan van Diggelen,Karel van den Bosch,Mark Neerincx
4+阅读 · 2018年7月23日
CIRL: Controllable Imitative Reinforcement Learning for Vision-based Self-driving
Xiaodan Liang,Tairui Wang,Luona Yang,Eric Xing
4+阅读 · 2018年7月10日
Konstantinos Chatzilygeroudis,Vassilis Vassiliades,Freek Stulp,Sylvain Calinon,Jean-Baptiste Mouret
3+阅读 · 2018年7月6日
Relational Deep Reinforcement Learning
Vinicius Zambaldi,David Raposo,Adam Santoro,Victor Bapst,Yujia Li,Igor Babuschkin,Karl Tuyls,David Reichert,Timothy Lillicrap,Edward Lockhart,Murray Shanahan,Victoria Langston,Razvan Pascanu,Matthew Botvinick,Oriol Vinyals,Peter Battaglia
5+阅读 · 2018年6月28日
Tambet Matiisen,Aqeel Labash,Daniel Majoral,Jaan Aru,Raul Vicente
4+阅读 · 2018年5月21日
Melrose Roderick,James MacGlashan,Stefanie Tellex
3+阅读 · 2017年11月20日