Entropy and Mutual Info

Reference: Here, which is a well-written introduction to both concepts. Entropy “The entropy of a random variable is a function which attempts to characterize the “unpredictability” of a random variable.” The unpredictability is both related to the frequency and the number of outcomes. A fair 666-sided die is more unpredictable than 6-sided die. But if we cheat on 666-sided one by making the side with number 1 super heavy, we may then find the 666-sided die more predictable....

September 19, 2022 · Dibbla

Compute Gradient for Matrix

This is an additional but useful note. First recap the derivatives for scalars, for example: $\frac{dy}{dx} = nx^{n-1}$ for $y = x^n$. And we all know the rules for different kinds of functions/composed functions. Note that the derivative does not always exist. When we generalize derivatives to gradients, we are generalizing scalars vectors. In this case, the shape matters. scalar vector scalar $\frac{\partial y}{\partial x}$ $\frac{\partial y}{\partial \textbf{x}}$ scalar $\frac{\partial \textbf{y}}{\partial x}$ $\frac{\partial \textbf{y}}{\partial \textbf{x}}$ Case 1: y is scalar, x is vector $$x = [x_1,x_2,x_3,\cdots,x_n]^T$$ $$\frac{\partial y}{\partial \textbf{x}}=[\frac{\partial y}{\partial x_1},\frac{\partial y}{\partial x_2},\cdots,\frac{\partial y}{\partial x_n}]$$...

August 6, 2022 · Dibbla

Tutorial 3-1: RNN

By Yinggan XU Dibbla This is generated by a previous courses (not included in Lee’s 2022 series), video can be found: RNN The RNN aims to deal with sequential inputs. We can first focus on the problem of slot filling: Time:______ Destination:_____ Here, the Time and Destination are the slots. We could like to automatically fill in the slots with given sentence: I would like to fly to Taipei on Nov 2nd....

July 15, 2022 · Dibbla

Tutorial 2: CNN

By Yinggan XU Dibbla This is generated by a previous courses (not included in Lee’s 2022 series), video can be found: CNN The motivation is that we can of course use MLP to find a function such that we do image classification etc. However, it’s not necessary and not efficient due to the tremendous number of parameters. We are going to use the properties of images themselves. Before that, we need to know the structure of a picture....

July 3, 2022 · Dibbla

Tutorial 1: Optimizers

By Yinggan XU Dibbla The tutorial video can be found here This notebook will only cover the basic optimizers and their ideas. However, the optimizers for DL remains a very interesting question. Background Knowledge $\mu$-strong Convexity We can refer to this note. A function $f$ is $\mu$-strong convex if: $$f(y)\ge f(x)+\nabla f(x)^T(y-x) + \frac{\mu}{2}||y-x||^2\newline \text{for some $\mu\ge0$ and for all $x,y$}$$ Note that Strong convexity doesn’t necessarily require the function to be differentiable, and the gradient is replaced by the sub-gradient when the function is non-smooth....

July 2, 2022 · Dibbla