课程官网 dlsyscourse.org Youtube Videos
2. Softmax Regression
lecture02: softmax_regression youtube linke
Three ingredients of a machine learning algorithm:
- The hypothesis class: describes how we map inputs to outputs.
- The loss function: specifies how well a given hypothesis.
- An optimization method: minimize the sum of losses over the training set.
Softmax / cross-entropy loss
Use softmax function normalize every dim of output, get the probability of every classes.
The gradient of the softmax objective.
Approach #1 (right way): Use matrix differential calculus, Jacobians, Kronecker products, and vectorization
Approach #2 (hacky quick way):Pretend everything is a scalar, use the typical chain rule, and then rearrange/transpose matrices/vectors to make the sizes work (and check your answer numerically).
3. Manual Neural Networks / Backprop
lecture 03: Manual Neural Networks / Backprop
早期的框架使用Backprop算法在原始的计算图上计算梯度,目前主流的框架使用Reverse mode AD来计算梯度,即构建一个反向的计算图来计算梯度。
使用Reverse mode AD的好处:
- 可以计算梯度的梯度
- 可以对计算图做额外的优化(算子融合)
4. Automatic Differentiation
Trere are many method to get differentiation.
Numerical differentiation
Numerical differentiation is suffer from numerical error and less efficient to compute .
However, numerical differentiation is a powerful tool to check an implement of an automatic differentiation algorithm in unit test cases.
Symbolic differentiation
Write down the formulas, derive the gradient by sum, product and chain rules.
Naively do so can result in wasted computations.
It cost $n(n-1)$ multiplies to compute all partial gradients.
Forward mode AD
Computaional graph
Each node represent an (intermediate) value in the computation. Edges present input output relations.
Forward mode automatic differentiation (AD)
The limitation of forward mode AD: For $f: \R^n \rightarrow \R^k$,we need $n$ forward AD passes to get the gradient with respect to each input.
We mostly care about the cases where $k = 1$ and large $n$. In order to resolve the problem efficiently, we need to use another kind of AD.
Reverse mode AD
Reverse mode automatic differentiation (Reverse mode AD)
Derivation for the muliple pathway case.
Reverse AD algorithm
Difference between Backprop and Reverse mode AD
Why we should take Reverse mode AD?
Reverse mode is easy to do gradient of gradient by construct another compute graph.
It also bring a lot more opportunities for underlying machine learning framework to do certain optimizations for the gradient computation.
5.
6. Optimization
7. Neural Network Library Abstractions
8.
- Post link: https://sanzo.top/Default/dlsys/
- Copyright Notice: All articles in this blog are licensed under unless otherwise stated.