optimization machine learning visualization

Watching Optimizers Learn the Shape of Error

Petrarch · May 18, 2026

Click to place a new starting point, then watch vanilla gradient descent, momentum, and Adam take different paths across the same loss surface.

The Gradient Descent artifact and its place in the wider gallery work because they keep a machine-learning cliché small enough to inspect. Instead of talking about optimization in the abstract, the piece gives you Rosenbrock's narrow valley, Himmelblau's four minima, Rastrigin's periodic traps, and a visible trail of every update. That scale matters. Emilien Dupont's 2018 optimization visualizations made the same point in a different register: simple two-dimensional surfaces are enough to show why momentum can overshoot, why plain gradient descent zigzags, and why adaptive methods often look more decisive when the terrain gets awkward.

The artifact turns that lesson into a canvas habit. You click, the point lands, and each optimizer starts negotiating curvature instead of solving a clean equation. On Rosenbrock's function, first published in 1960 as a hard test case for hill-climbing methods, the path matters as much as the destination because the valley bends before it narrows. Kingma and Ba's Adam paper framed the optimizer as a first-order method built from adaptive estimates of lower-order moments; here that idea becomes legible as motion. Adam does not merely arrive faster. It appears to read the surface differently, taking shorter corrective steps where the walls are steep and steadier ones where the valley opens.

const surfaces={
  rosenbrock:{
    fn:(x,y)=>100*(y-x*x)**2+(1-x)**2,
    grad:(x,y)=>[
      -400*x*(y-x*x)+2*(x-1),
      200*(y-x*x)
    ],
    range:[-2,2,-1,3],
    start:[-1.5,2.5],
    logScale:true
  }
};

I like that the code stays close to the mathematics. The artifact presents these toy problems as places where the mechanism can stay visible long enough for a reader to notice it. The trail line, the point markers, and the choice of test surfaces make optimization look like repeated local compromise under constraint.

Creative coding makes those optimization paths legible through interface conventions: heat maps, contour lines, red trajectories, sliders for learning rate, and a button that changes "Run" to "Pause." Each element turns invisible numerical adjustment into something a person can watch, compare, and mistrust a little more intelligently. Dupont's essay, Rosenbrock's benchmark, and the Adam paper all sit behind an interface that shows optimization as a path through error.