Compute Gradient for Matrix
This is an additional but useful note. First recap the derivatives for scalars, for example: $\frac{dy}{dx} = nx^{n-1}$ for $y = x^n$. And we all know the rules for different kinds of functions/composed functions. Note that the derivative does not always exist. When we generalize derivatives to gradients, we are generalizing scalars vectors. In this case, the shape matters. scalar vector scalar $\frac{\partial y}{\partial x}$ $\frac{\partial y}{\partial \textbf{x}}$ scalar $\frac{\partial \textbf{y}}{\partial x}$ $\frac{\partial \textbf{y}}{\partial \textbf{x}}$ Case 1: y is scalar, x is vector $$x = [x_1,x_2,x_3,\cdots,x_n]^T$$ $$\frac{\partial y}{\partial \textbf{x}}=[\frac{\partial y}{\partial x_1},\frac{\partial y}{\partial x_2},\cdots,\frac{\partial y}{\partial x_n}]$$...