The Intervention-based Imitation Learning (IIL) Family

From DAgger, to HG-DAgger and more recent advances DAgger Dataset Aggregation (DAgger) is a imitation learning algorithm proposed in AISTAT11 paper A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning by Stéphane Ross, Geoffrey J. Gordon and J. Andrew Bagnell. It is a simple yet effective algorithm that has been widely used in imitation learning, and as you can tell from the title, it’s not related to human-in-the-loop RL....

October 21, 2023 · Dibbla

A Mini Scribe for FCT & MEC

This is a scribe of CS294 082 by Prof. Gerald Friedland from UC Berkeley The Idea of Function Counting Theorem First we start with Function Counting Theorem (Cover’s Theorem, Thomas M. Cover 1965). For example, we have a 2-dimensional space with 4 points. We have multiple ways to linearly separate these points: Look at $l_5$, it separates $x_1$, $x_4$ on the left side and $x_2$, $x_3$ on the right side....

February 27, 2023 · Dibbla

Household DRL 0: DQN

DQN, Deep Q Network, is one of the most famous deep reinforcement learning algorithms that combined deep learning with reinforcement learning and really impressed people at that time. In this note, the basic idea of DQN is covered along with implemented code. Q-Learning Before talking about DQN, we shall discuss Q-learning first. What is Q-learning learning? Q-learning is a value-based method whose purpose is to learn a value function. In order to achieve this goal, we can adopt the $q$ value, which is action-value function:...

December 10, 2022 · Dibbla

Notes on Generalization/Cross-Embodiment Experiments

In paper1 Generalizable Imitation Learning from Observation via Inferring Goal Proximity, the idea of task structure/task information is proposed without further citation or reference. This high-level task structure generalizes to new situations and thus helps us to quickly learn the task in new situations. As for current AIRL methods: However, such learned reward functions often overfit to the expert demonstrations by learning spurious correlations between task-irrelevant features and expert/agent labels CoRL21, and thus suffer from generalization to slightly different initial and goal configurations from the ones seen in the demonstrations (e....

October 25, 2022 · Dibbla

RL generalization: Generalizable LfO via Inferring Goal Proximity

Paper Here; Official Blog Here Generalizable Imitation Learning from Observation via Inferring Goal Proximity is a NIPS2021 paper which focuses on the generalization problem of Learning from Demonstration(LfO). The idea of the paper is quite straightforward without much mathematical explanations. In this blog I will show the high-level idea and experiment setting of the paper. Preliminaries: LfO and “Goal” idea LfO is an imitation learning setting, where we cannot access the action information of experts’ demonstrations....

October 22, 2022 · Dibbla