Note for ML

This part contains my notes for

Hung-Yi Lee’s famous ML course
ML basics & related math
Notes for other things that I’d like to present

A Mini Scribe for FCT & MEC

This is a scribe of CS294 082 by Prof. Gerald Friedland from UC Berkeley The Idea of Function Counting Theorem First we start with Function Counting Theorem (Cover’s Theorem, Thomas M. Cover 1965). For example, we have a 2-dimensional space with 4 points. We have multiple ways to linearly separate these points: Look at $l_5$, it separates $x_1$, $x_4$ on the left side and $x_2$, $x_3$ on the right side....

Entropy and Mutual Info

Reference: Here, which is a well-written introduction to both concepts. Entropy “The entropy of a random variable is a function which attempts to characterize the “unpredictability” of a random variable.” The unpredictability is both related to the frequency and the number of outcomes. A fair 666-sided die is more unpredictable than 6-sided die. But if we cheat on 666-sided one by making the side with number 1 super heavy, we may then find the 666-sided die more predictable....

Compute Gradient for Matrix

This is an additional but useful note. First recap the derivatives for scalars, for example: $\frac{dy}{dx} = nx^{n-1}$ for $y = x^n$. And we all know the rules for different kinds of functions/composed functions. Note that the derivative does not always exist. When we generalize derivatives to gradients, we are generalizing scalars vectors. In this case, the shape matters. scalar vector scalar $\frac{\partial y}{\partial x}$ $\frac{\partial y}{\partial \textbf{x}}$ scalar $\frac{\partial \textbf{y}}{\partial x}$ $\frac{\partial \textbf{y}}{\partial \textbf{x}}$ Case 1: y is scalar, x is vector $$x = [x_1,x_2,x_3,\cdots,x_n]^T$$ $$\frac{\partial y}{\partial \textbf{x}}=[\frac{\partial y}{\partial x_1},\frac{\partial y}{\partial x_2},\cdots,\frac{\partial y}{\partial x_n}]$$...