Visualization of Optimization Algorithms

This notebook gives an intuitive visualization and animation of the 3 optimization algorithms, with adjustable parameters. In all algorithms, a differentiable function $f$ and starting position $x_0$ are given, and the goal is to iteratively approach the $\text{argmin}(f)$.

We use two functions to illustrate different set of challenges that the optimization algorithms face. The first one is $f(x,y)=x^2+by^2$ with large learning rate, and we see different oscillation behavior in the 3 algorithms. The second one is $f(x,y)=x^6+by^4$ where the gradient in the neighborhood of $0$ is small, and we see how fast the 3 algorithms converge.

Regular gradient descent

$$x_{k+1} = x_k - \alpha \nabla f(x_k),$$

where $\alpha$ is the learning rate.

Momentum gradient descent

$$\begin{aligned} z_{k+1} &= \beta z_k + \nabla f(w_k) \\ x_{k+1} &= x_k - \alpha z_{k+1}, \end{aligned}$$

where $\alpha$ is the learning rate, and $\beta$ is the momentum coefficient.

Nesterov Acceleration

$$x_{k+1} = x_k + \beta\left(x_k - x_{k-1}\right) - \alpha \nabla f\big(x_k + \gamma (x_k-x_{k-1})\big),$$

where $\alpha$ is the learning rate, $\beta$ is the momentum coefficient, and $\gamma$ is the Nesterov coefficient.

 # jupyter notebook code omitted ...