Dibbla.Space

Representation Learning with RL: SPR

Data-Efficient Reinforcement Learning with Self-Predictive Representations As we see in the blog, policy similarity metric (PSM) uses a specially designed bisimulation relation to force representation network to learn the transition dynamics. This blog will give a brief overview of another method, self-predictive dynamics, which learns about transition dynamics in a more explicit way. The goal of SPR is to improve the sample-efficiency with self-supervised process. This leverages limitless training signals from self-predictive process....

Representation Learning with RL: SimCLR to PSM

Representation learning has been widely used and studied in CV&NLP. It is not surprising that people transfer the methods and ideas to reinforcement learning, especially for generalization and data-efficiency. SimCLR, as a widely used self-supervised learning (SSL) method, has achieved excellent performance in CV tasks. The very basic idea is to learn a representation. Under ideal circumstances, representations of pictures are high-level information abstract. SimCLR forces the representation network to learn invariants among pictures with a carefully designed structure....

Compute Gradient for Matrix

This is an additional but useful note. First recap the derivatives for scalars, for example: $\frac{dy}{dx} = nx^{n-1}$ for $y = x^n$. And we all know the rules for different kinds of functions/composed functions. Note that the derivative does not always exist. When we generalize derivatives to gradients, we are generalizing scalars vectors. In this case, the shape matters. scalar vector scalar $\frac{\partial y}{\partial x}$ $\frac{\partial y}{\partial \textbf{x}}$ scalar $\frac{\partial \textbf{y}}{\partial x}$ $\frac{\partial \textbf{y}}{\partial \textbf{x}}$ Case 1: y is scalar, x is vector $$x = [x_1,x_2,x_3,\cdots,x_n]^T$$ $$\frac{\partial y}{\partial \textbf{x}}=[\frac{\partial y}{\partial x_1},\frac{\partial y}{\partial x_2},\cdots,\frac{\partial y}{\partial x_n}]$$...

Hanabi Paper List

Dibbla: This file/list contains several papers about Hanabi, but mostly focus on 2 ideas: MCTS method and learning a protocol. Theoretical Method Playing Hanabi Near-Optimally This paper, from a theory view, provides a hat-guessing strategy that reaches nearly full score in some settings. Check here. Survey The Hanabi challenge: A new frontier for AI research Check here The 2018 Hanabi Competition Check here MCTS Re-determinizing MCTS in Hanabi Check here...

Tutorial 3-1: RNN

By Yinggan XU Dibbla This is generated by a previous courses (not included in Lee’s 2022 series), video can be found: RNN The RNN aims to deal with sequential inputs. We can first focus on the problem of slot filling: Time:______ Destination:_____ Here, the Time and Destination are the slots. We could like to automatically fill in the slots with given sentence: I would like to fly to Taipei on Nov 2nd....