Sutton & Barto Reading Note: Chapter 3
In the last note, we have covered first 2 chapters of the book, and discussed about the tabular cases of RL(Bandit problems). In this note, we will discuss the Finite Markov Decision Process(MDP) and the Bellman Equation. Agent-Environment Interface, Goals and Rewards As in this series we assume readers have some ideas about “RL learns from interactions with the environment”, we will only briefly introduce the agent-environment interface here. It can be illustrated in a diagram as below:...