Representation Learning with RL: SPR
Data-Efficient Reinforcement Learning with Self-Predictive Representations As we see in the blog, policy similarity metric (PSM) uses a specially designed bisimulation relation to force representation network to learn the transition dynamics. This blog will give a brief overview of another method, self-predictive dynamics, which learns about transition dynamics in a more explicit way. The goal of SPR is to improve the sample-efficiency with self-supervised process. This leverages limitless training signals from self-predictive process. The very high level idea is: the representation component of the architecture will predict a piece of future trajectory, then we minimize the gap between predicted future state and the real future state. The trained representations will be later fed to q-learning head as the input of Rainbow. Intuitively, the representation is forced to understand the environment dynamic. ...