Reading Note

Sutton & Barto Reading Note: Chapter 3

In the last note, we have covered first 2 chapters of the book, and discussed about the tabular cases of RL(Bandit problems). In this note, we will discuss the Finite Markov Decision Process(MDP) and the Bellman Equation. Agent-Environment Interface, Goals and Rewards As in this series we assume readers have some ideas about “RL learns from interactions with the environment”, we will only briefly introduce the agent-environment interface here. It can be illustrated in a diagram as below:...

Sutton & Barto Reading Note: Chapter 1-2

I am reviewing the book, Reinforcement Learning: An Introduction by Sutton and Barto. This post covers the first two chapters of the book. As the very first note in this series, it is good to explain why I write these notes. First of all, it is good to review RL even in this era where LLM/AIGC is the new hype. Secondly, I am preparing for my job search and grad study....