## 多目标的强化学习教程

2018 年 1 月 25 日 CreateAMind

1  https://flyyufelix.github.io/2017/11/17/direct-future-prediction.html

Direct Future Prediction - Supervised Learning for Reinforcement Learning

2  原文https://www.oreilly.com/ideas/reinforcement-learning-for-complex-goals-using-tensorflow，

This new formulation changes our neural network in several ways. Instead of just a state, we will also provide as input to the network the current measurements and goal. Instead of Q-values, our network will now output a prediction tensor of the form [Measurements X Actions X Offsets]. Taking the product of the summed predicted future changes and our goals, we can pick actions that best satisfy our goals over time:

`https://mp.weixin.qq.com/s/XHdaoOWBgOWX7SrOemY4jw`

### 更多

Deep Reinforcement Learning via Policy Optimization

The tutorial is written for those who would like an introduction to reinforcement learning (RL). The aim is to provide an intuitive presentation of the ideas rather than concentrate on the deeper mathematics underlying the topic. RL is generally used to solve the so-called Markov decision problem (MDP). In other words, the problem that you are attempting to solve with RL should be an MDP or its variant. The theory of RL relies on dynamic programming (DP) and artificial intelligence (AI). We will begin with a quick description of MDPs. We will discuss what we mean by “complex” and “large-scale” MDPs. Then we will explain why RL is needed to solve complex and large-scale MDPs. The semi-Markov decision problem (SMDP) will also be covered.

The tutorial is meant to serve as an introduction to these topics and is based mostly on the book: “Simulation-based optimization: Parametric Optimization techniques and reinforcement learning” [4]. The book discusses this topic in greater detail in the context of simulators. There are at least two other textbooks that I would recommend you to read: (i) Neuro-dynamic programming [2] (lots of details on convergence analysis) and (ii) Reinforcement Learning: An Introduction [11] (lots of details on underlying AI concepts). A more recent tutorial on this topic is [8]. This tutorial has 2 sections: • Section 2 discusses MDPs and SMDPs. • Section 3 discusses RL. By the end of this tutorial, you should be able to • Identify problem structures that can be set up as MDPs / SMDPs. • Use some RL algorithms.

41+阅读 · 2019年9月11日
CreateAMind
4+阅读 · 2018年12月28日
CreateAMind
4+阅读 · 2018年12月17日

8+阅读 · 2018年3月18日

5+阅读 · 2018年2月12日

6+阅读 · 2017年12月22日

6+阅读 · 2017年11月24日

16+阅读 · 2017年8月26日
CreateAMind
11+阅读 · 2017年8月2日

81+阅读 · 2020年3月18日

63+阅读 · 2020年2月8日

107+阅读 · 2020年2月1日

111+阅读 · 2019年12月14日

61+阅读 · 2019年10月16日

74+阅读 · 2019年10月11日

45+阅读 · 2019年10月10日

Mingzhen Li,Yi Liu,Xiaoyan Liu,Qingxiao Sun,Xin You,Hailong Yang,Zhongzhi Luan,Depei Qian
9+阅读 · 2020年2月6日
Fuzhen Zhuang,Zhiyuan Qi,Keyu Duan,Dongbo Xi,Yongchun Zhu,Hengshu Zhu,Hui Xiong,Qing He
90+阅读 · 2019年11月7日
Yufei Ye,Dhiraj Gandhi,Abhinav Gupta,Shubham Tulsiani
4+阅读 · 2019年10月8日
Shangbang Long,Xin He,Cong Yao
16+阅读 · 2019年9月5日
Prithviraj Ammanabrolu,Mark O. Riedl
4+阅读 · 2019年3月25日
Borja Ibarz,Jan Leike,Tobias Pohlen,Geoffrey Irving,Shane Legg,Dario Amodei
4+阅读 · 2018年11月15日
Steven Hansen,Pablo Sprechmann,Alexander Pritzel,André Barreto,Charles Blundell
3+阅读 · 2018年10月18日
Eli Friedman,Fred Fontaine
5+阅读 · 2018年9月17日
Cédric Colas,Olivier Sigaud,Pierre-Yves Oudeyer
3+阅读 · 2018年8月17日
Lei Zhang,Shuai Wang,Bing Liu
25+阅读 · 2018年1月24日
Top