Sutton & Barto Reading Note: Chapter 3

In the last note, we have covered first 2 chapters of the book, and discussed about the tabular cases of RL(Bandit problems). In this note, we will discuss the Finite Markov Decision Process(MDP) and the Bellman Equation. Agent-Environment Interface, Goals and Rewards As in this series we assume readers have some ideas about “RL learns from interactions with the environment”, we will only briefly introduce the agent-environment interface here. It can be illustrated in a diagram as below:...

June 16, 2024 · Dibbla

Sutton & Barto Reading Note: Chapter 3

In the last note, we have covered first 2 chapters of the book, and discussed about the tabular cases of RL(Bandit problems). In this note, we will discuss the Finite Markov Decision Process(MDP) and the Bellman Equation. Agent-Environment Interface, Goals and Rewards As in this series we assume readers have some ideas about “RL learns from interactions with the environment”, we will only briefly introduce the agent-environment interface here. It can be illustrated in a diagram as below:...

June 16, 2024 · Dibbla

Sutton & Barto Reading Note: Chapter 1-2

I am reviewing the book, Reinforcement Learning: An Introduction by Sutton and Barto. This post covers the first two chapters of the book. As the very first note in this series, it is good to explain why I write these notes. First of all, it is good to review RL even in this era where LLM/AIGC is the new hype. Secondly, I am preparing for my job search and grad study....

June 13, 2024 · Dibbla

Sutton & Barto Reading Note: Chapter 1-2

I am reviewing the book, Reinforcement Learning: An Introduction by Sutton and Barto. This post covers the first two chapters of the book. As the very first note in this series, it is good to explain why I write these notes. First of all, it is good to review RL even in this era where LLM/AIGC is the new hype. Secondly, I am preparing for my job search and grad study....

June 13, 2024 · Dibbla

The Intervention-based Imitation Learning (IIL) Family

From DAgger, to HG-DAgger and more recent advances DAgger Dataset Aggregation (DAgger) is a imitation learning algorithm proposed in AISTAT11 paper A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning by Stéphane Ross, Geoffrey J. Gordon and J. Andrew Bagnell. It is a simple yet effective algorithm that has been widely used in imitation learning, and as you can tell from the title, it’s not related to human-in-the-loop RL....

October 21, 2023 · Dibbla