Sutton & Barto Reading Note: Chapter 3

In the last note, we have covered first 2 chapters of the book, and discussed about the tabular cases of RL(Bandit problems). In this note, we will discuss the Finite Markov Decision Process(MDP) and the Bellman Equation. Agent-Environment Interface, Goals and Rewards As in this series we assume readers have some ideas about “RL learns from interactions with the environment”, we will only briefly introduce the agent-environment interface here. It can be illustrated in a diagram as below:...

June 16, 2024 · Dibbla

Sutton & Barto Reading Note: Chapter 3

In the last note, we have covered first 2 chapters of the book, and discussed about the tabular cases of RL(Bandit problems). In this note, we will discuss the Finite Markov Decision Process(MDP) and the Bellman Equation. Agent-Environment Interface, Goals and Rewards As in this series we assume readers have some ideas about “RL learns from interactions with the environment”, we will only briefly introduce the agent-environment interface here. It can be illustrated in a diagram as below:...

June 16, 2024 · Dibbla

Sutton & Barto Reading Note: Chapter 1-2

I am reviewing the book, Reinforcement Learning: An Introduction by Sutton and Barto. This post covers the first two chapters of the book. As the very first note in this series, it is good to explain why I write these notes. First of all, it is good to review RL even in this era where LLM/AIGC is the new hype. Secondly, I am preparing for my job search and grad study....

June 13, 2024 · Dibbla

Sutton & Barto Reading Note: Chapter 1-2

I am reviewing the book, Reinforcement Learning: An Introduction by Sutton and Barto. This post covers the first two chapters of the book. As the very first note in this series, it is good to explain why I write these notes. First of all, it is good to review RL even in this era where LLM/AIGC is the new hype. Secondly, I am preparing for my job search and grad study....

June 13, 2024 · Dibbla

Household DRL 0: DQN

DQN, Deep Q Network, is one of the most famous deep reinforcement learning algorithms that combined deep learning with reinforcement learning and really impressed people at that time. In this note, the basic idea of DQN is covered along with implemented code. Q-Learning Before talking about DQN, we shall discuss Q-learning first. What is Q-learning learning? Q-learning is a value-based method whose purpose is to learn a value function. In order to achieve this goal, we can adopt the $q$ value, which is action-value function:...

December 10, 2022 · Dibbla