Dibbla.Space

A General List of Robotics Paper Resources

Here I listed several resources for Robotics papers. Maybe the list will focus more on Robot Learning aspect. First of all, you may check arXiv.RO everyday with my newly developed project Everyday-arXiv. There are several different conferences about robotics (of course we can see many CV/ML works published on these conference today). RSS Robotics: Science and Systems (RSS). Accepted papers can be accessed via RSS official website. CoRL The Conference on Robot Learning (CoRL) is an annual international conference focusing on the intersection of robotics and machine learning. ...

Generalization & Imitation Learning: IRL Identifiability Part1

Paper reference Paper1: Towards Resolving Unidentifiability in Inverse Reinforcement Learning HERE Paper2: Identifiability in inverse reinforcement learning HERE Paper3: Identifiability and generalizability from multiple experts in Inverse Reinforcement Learning HERE This papers are quite theoretical and not so easy to read. But they, at least for me, reveals something to do with generalization. Preliminaries: IRL & Identifiability IRL, as a subset of Imitation Learning, aims to recover the reward function of certain MDP, given the reward-free environment $E$ and an optimal agent policy $\pi$. The goal is to deduce a reward function $R$ such that policy $\pi$ is optimal in $MDP\ (E,R)$. ...

Pre-training with RL: APT

Behave From the Void: Unsupervised Active Pre-training paper The paper, Behave From the Void: Unsupervised Active Pre-training, proposed a new method for pretraining RL agents, APT , which is claimed to beat all baselines on DMControl Suite. As the abstract pointed out: the key novel idea is to explore the environment by maximizing a non-parametric entropy computed in a abstract representation space. This blog will take a look at the motivation, method and explanation of the paper, as well as compare it with the other AAAI paper. ...

RL generalization: 2 Evaluations

It is obvious that to propose a problem better, one has to illustrate the problem well. RL generalization, as the survey indicated, is a class of problems. And here, we show two benchmark environments and their common experiment settings. Procgen Following Coinrun, OpenAI’s team proposed a new testing environment called procgen. Consisting of 16 games, the Procgen provides a convenient way to generate environments procedurally that share the same underlying logic and reward but are different in layout and rendering. All 16 games share the discrete action space of size 15 and 64x64x3 RGB observation. ...

Entropy and Mutual Info

Reference: Here, which is a well-written introduction to both concepts. Entropy “The entropy of a random variable is a function which attempts to characterize the “unpredictability” of a random variable.” The unpredictability is both related to the frequency and the number of outcomes. A fair 666-sided die is more unpredictable than 6-sided die. But if we cheat on 666-sided one by making the side with number 1 super heavy, we may then find the 666-sided die more predictable. ...

Representation Learning with RL: SPR

Data-Efficient Reinforcement Learning with Self-Predictive Representations As we see in the blog, policy similarity metric (PSM) uses a specially designed bisimulation relation to force representation network to learn the transition dynamics. This blog will give a brief overview of another method, self-predictive dynamics, which learns about transition dynamics in a more explicit way. The goal of SPR is to improve the sample-efficiency with self-supervised process. This leverages limitless training signals from self-predictive process. The very high level idea is: the representation component of the architecture will predict a piece of future trajectory, then we minimize the gap between predicted future state and the real future state. The trained representations will be later fed to q-learning head as the input of Rainbow. Intuitively, the representation is forced to understand the environment dynamic. ...

Representation Learning with RL: SimCLR to PSM

Representation learning has been widely used and studied in CV&NLP. It is not surprising that people transfer the methods and ideas to reinforcement learning, especially for generalization and data-efficiency. SimCLR, as a widely used self-supervised learning (SSL) method, has achieved excellent performance in CV tasks. The very basic idea is to learn a representation. Under ideal circumstances, representations of pictures are high-level information abstract. SimCLR forces the representation network to learn invariants among pictures with a carefully designed structure. ...

Compute Gradient for Matrix

This is an additional but useful note. First recap the derivatives for scalars, for example: $\frac{dy}{dx} = nx^{n-1}$ for $y = x^n$. And we all know the rules for different kinds of functions/composed functions. Note that the derivative does not always exist. When we generalize derivatives to gradients, we are generalizing scalars vectors. In this case, the shape matters. scalar vector scalar $\frac{\partial y}{\partial x}$ $\frac{\partial y}{\partial \textbf{x}}$ scalar $\frac{\partial \textbf{y}}{\partial x}$ $\frac{\partial \textbf{y}}{\partial \textbf{x}}$ Case 1: y is scalar, x is vector $$x = [x_1,x_2,x_3,\cdots,x_n]^T$$ $$\frac{\partial y}{\partial \textbf{x}}=[\frac{\partial y}{\partial x_1},\frac{\partial y}{\partial x_2},\cdots,\frac{\partial y}{\partial x_n}]$$ ...

Hanabi Paper List

Dibbla: This file/list contains several papers about Hanabi, but mostly focus on 2 ideas: MCTS method and learning a protocol. Theoretical Method Playing Hanabi Near-Optimally This paper, from a theory view, provides a hat-guessing strategy that reaches nearly full score in some settings. Check here. Survey The Hanabi challenge: A new frontier for AI research Check here The 2018 Hanabi Competition Check here MCTS Re-determinizing MCTS in Hanabi Check here Information Set Monte Carlo Tree Search Where the IS-MCTS was proposed. Check here ...

Tutorial 3-1: RNN

By Yinggan XU Dibbla This is generated by a previous courses (not included in Lee’s 2022 series), video can be found: RNN The RNN aims to deal with sequential inputs. We can first focus on the problem of slot filling: Time:______ Destination:_____ Here, the Time and Destination are the slots. We could like to automatically fill in the slots with given sentence: I would like to fly to Taipei on Nov 2nd. We have to know “Taipei” is the destination and “Nov 2nd” is the time. ...