Direct Future Prediction - Supervised Learning for Reinforcement Learning
This new formulation changes our neural network in several ways. Instead of just a state, we will also provide as input to the network the current measurements and goal. Instead of Q-values, our network will now output a prediction tensor of the form [Measurements X Actions X Offsets]. Taking the product of the summed predicted future changes and our goals, we can pick actions that best satisfy our goals over time:
The tutorial is written for those who would like an introduction to reinforcement learning (RL). The aim is to provide an intuitive presentation of the ideas rather than concentrate on the deeper mathematics underlying the topic. RL is generally used to solve the so-called Markov decision problem (MDP). In other words, the problem that you are attempting to solve with RL should be an MDP or its variant. The theory of RL relies on dynamic programming (DP) and artificial intelligence (AI). We will begin with a quick description of MDPs. We will discuss what we mean by “complex” and “large-scale” MDPs. Then we will explain why RL is needed to solve complex and large-scale MDPs. The semi-Markov decision problem (SMDP) will also be covered.
The tutorial is meant to serve as an introduction to these topics and is based mostly on the book: “Simulation-based optimization: Parametric Optimization techniques and reinforcement learning” . The book discusses this topic in greater detail in the context of simulators. There are at least two other textbooks that I would recommend you to read: (i) Neuro-dynamic programming  (lots of details on convergence analysis) and (ii) Reinforcement Learning: An Introduction  (lots of details on underlying AI concepts). A more recent tutorial on this topic is . This tutorial has 2 sections: • Section 2 discusses MDPs and SMDPs. • Section 3 discusses RL. By the end of this tutorial, you should be able to • Identify problem structures that can be set up as MDPs / SMDPs. • Use some RL algorithms.