Hung-Yi Lee

Tutorial 3-1: RNN

By Yinggan XU Dibbla This is generated by a previous courses (not included in Lee’s 2022 series), video can be found: RNN The RNN aims to deal with sequential inputs. We can first focus on the problem of slot filling: Time:______ Destination:_____ Here, the Time and Destination are the slots. We could like to automatically fill in the slots with given sentence: I would like to fly to Taipei on Nov 2nd. We have to know “Taipei” is the destination and “Nov 2nd” is the time. ...

Tutorial 2: CNN

By Yinggan XU Dibbla This is generated by a previous courses (not included in Lee’s 2022 series), video can be found: CNN The motivation is that we can of course use MLP to find a function such that we do image classification etc. However, it’s not necessary and not efficient due to the tremendous number of parameters. We are going to use the properties of images themselves. Before that, we need to know the structure of a picture. For a RGB picture, each pixel in the whole picture is determined by 3 values: Red, Green, Blue (RGB), and R,G,B are called channels. So actually, we are given a 3 layer matrix when we are given a RGB picture. ...

Tutorial 1: Optimizers

By Yinggan XU Dibbla The tutorial video can be found here This notebook will only cover the basic optimizers and their ideas. However, the optimizers for DL remains a very interesting question. Background Knowledge $\mu$-strong Convexity We can refer to this note. A function $f$ is $\mu$-strong convex if: $$f(y)\ge f(x)+\nabla f(x)^T(y-x) + \frac{\mu}{2}||y-x||^2\newline \text{for some $\mu\ge0$ and for all $x,y$}$$ Note that Strong convexity doesn’t necessarily require the function to be differentiable, and the gradient is replaced by the sub-gradient when the function is non-smooth. ...

Lecture 3: Validation & Why Deep?

By Yinggan XU Dibbla In this Lecture, Lee introduces the idea of select best model through validation set performance. Lee also explains how deep NN outperforms fat (wide) NN. Validation set The CORE question we want to figure out is Why I used validation set but still overfit? graph LR id1[model 1 with para space H1] id2[model 2 with para space H2] id3[model 3 with para space H3] id4[Validation Set] id1-->id4 id2-->id4 id3-->id4 id5[Validation-Loss-1=0.4] id6[Validation-Loss-2=0.3] id7[Validation-Loss-3=0.2] id4-->id5 id4-->id6 id4-->id7 Then we are going to choose model 3 with $h_3\in\mathcal{H_3}$. ...

Lecture 1&2: Basics of ML and Why Fail

By Yinggan XU Dibbla The notebook basically is a summary for what is mentioned in Hung-Yi Lee’s ML course 2022. Sometimes people have contradictive ideas like whether PPO is an off-policy algorithm (or my grammar mistakes in the notebook) , but in this notebook, I’m going to make things aligned with Hung-Yi Lee’s idea. Lecture 1 - Intro & Basic Idea of ML The basic idea of ML, according to Lee, is to find a function. The function can take various types of inputs and output different results as well. ...