RL Blogs

The Intervention-based Imitation Learning (IIL) Family

From DAgger, to HG-DAgger and more recent advances DAgger Dataset Aggregation (DAgger) is a imitation learning algorithm proposed in AISTAT11 paper A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning by Stéphane Ross, Geoffrey J. Gordon and J. Andrew Bagnell. It is a simple yet effective algorithm that has been widely used in imitation learning, and as you can tell from the title, it’s not related to human-in-the-loop RL....

Notes on Generalization/Cross-Embodiment Experiments

In paper1 Generalizable Imitation Learning from Observation via Inferring Goal Proximity, the idea of task structure/task information is proposed without further citation or reference. This high-level task structure generalizes to new situations and thus helps us to quickly learn the task in new situations. As for current AIRL methods: However, such learned reward functions often overfit to the expert demonstrations by learning spurious correlations between task-irrelevant features and expert/agent labels CoRL21, and thus suffer from generalization to slightly different initial and goal configurations from the ones seen in the demonstrations (e....

RL generalization: Generalizable LfO via Inferring Goal Proximity

Paper Here; Official Blog Here Generalizable Imitation Learning from Observation via Inferring Goal Proximity is a NIPS2021 paper which focuses on the generalization problem of Learning from Demonstration(LfO). The idea of the paper is quite straightforward without much mathematical explanations. In this blog I will show the high-level idea and experiment setting of the paper. Preliminaries: LfO and “Goal” idea LfO is an imitation learning setting, where we cannot access the action information of experts’ demonstrations....

Generalization & Imitation Learning: IRL Identifiability Part1

Paper reference Paper1: Towards Resolving Unidentifiability in Inverse Reinforcement Learning HERE Paper2: Identifiability in inverse reinforcement learning HERE Paper3: Identifiability and generalizability from multiple experts in Inverse Reinforcement Learning HERE This papers are quite theoretical and not so easy to read. But they, at least for me, reveals something to do with generalization. Preliminaries: IRL & Identifiability IRL, as a subset of Imitation Learning, aims to recover the reward function of certain MDP, given the reward-free environment $E$ and an optimal agent policy $\pi$....

Pre-training with RL: APT

Behave From the Void: Unsupervised Active Pre-training paper The paper, Behave From the Void: Unsupervised Active Pre-training, proposed a new method for pretraining RL agents, APT , which is claimed to beat all baselines on DMControl Suite. As the abstract pointed out: the key novel idea is to explore the environment by maximizing a non-parametric entropy computed in a abstract representation space. This blog will take a look at the motivation, method and explanation of the paper, as well as compare it with the other AAAI paper....

RL generalization: 2 Evaluations

It is obvious that to propose a problem better, one has to illustrate the problem well. RL generalization, as the survey indicated, is a class of problems. And here, we show two benchmark environments and their common experiment settings. Procgen Following Coinrun, OpenAI’s team proposed a new testing environment called procgen. Consisting of 16 games, the Procgen provides a convenient way to generate environments procedurally that share the same underlying logic and reward but are different in layout and rendering....

Representation Learning with RL: SPR

Data-Efficient Reinforcement Learning with Self-Predictive Representations As we see in the blog, policy similarity metric (PSM) uses a specially designed bisimulation relation to force representation network to learn the transition dynamics. This blog will give a brief overview of another method, self-predictive dynamics, which learns about transition dynamics in a more explicit way. The goal of SPR is to improve the sample-efficiency with self-supervised process. This leverages limitless training signals from self-predictive process....

Representation Learning with RL: SimCLR to PSM

Representation learning has been widely used and studied in CV&NLP. It is not surprising that people transfer the methods and ideas to reinforcement learning, especially for generalization and data-efficiency. SimCLR, as a widely used self-supervised learning (SSL) method, has achieved excellent performance in CV tasks. The very basic idea is to learn a representation. Under ideal circumstances, representations of pictures are high-level information abstract. SimCLR forces the representation network to learn invariants among pictures with a carefully designed structure....