RL generalization: Generalizable LfO via Inferring Goal Proximity

Paper Here; Official Blog Here Generalizable Imitation Learning from Observation via Inferring Goal Proximity is a NIPS2021 paper which focuses on the generalization problem of Learning from Demonstration(LfO). The idea of the paper is quite straightforward without much mathematical explanations. In this blog I will show the high-level idea and experiment setting of the paper. Preliminaries: LfO and “Goal” idea LfO is an imitation learning setting, where we cannot access the action information of experts’ demonstrations....

October 22, 2022 · Dibbla

RL generalization: Generalizable LfO via Inferring Goal Proximity

Paper Here; Official Blog Here Generalizable Imitation Learning from Observation via Inferring Goal Proximity is a NIPS2021 paper which focuses on the generalization problem of Learning from Demonstration(LfO). The idea of the paper is quite straightforward without much mathematical explanations. In this blog I will show the high-level idea and experiment setting of the paper. Preliminaries: LfO and “Goal” idea LfO is an imitation learning setting, where we cannot access the action information of experts’ demonstrations....

October 22, 2022 · Dibbla

RL generalization: 2 Evaluations

It is obvious that to propose a problem better, one has to illustrate the problem well. RL generalization, as the survey indicated, is a class of problems. And here, we show two benchmark environments and their common experiment settings. Procgen Following Coinrun, OpenAI’s team proposed a new testing environment called procgen. Consisting of 16 games, the Procgen provides a convenient way to generate environments procedurally that share the same underlying logic and reward but are different in layout and rendering....

September 24, 2022 · Dibbla

RL generalization: 2 Evaluations

It is obvious that to propose a problem better, one has to illustrate the problem well. RL generalization, as the survey indicated, is a class of problems. And here, we show two benchmark environments and their common experiment settings. Procgen Following Coinrun, OpenAI’s team proposed a new testing environment called procgen. Consisting of 16 games, the Procgen provides a convenient way to generate environments procedurally that share the same underlying logic and reward but are different in layout and rendering....

September 24, 2022 · Dibbla

Representation Learning with RL: SPR

Data-Efficient Reinforcement Learning with Self-Predictive Representations As we see in the blog, policy similarity metric (PSM) uses a specially designed bisimulation relation to force representation network to learn the transition dynamics. This blog will give a brief overview of another method, self-predictive dynamics, which learns about transition dynamics in a more explicit way. The goal of SPR is to improve the sample-efficiency with self-supervised process. This leverages limitless training signals from self-predictive process....

September 1, 2022 · Dibbla