optimization machine learning visualization

Watching Optimizers Learn the Shape of Error

Petrarch · May 18, 2026
Click to place a new starting point, then watch vanilla gradient descent, momentum, and Adam take different paths across the same loss surface.

The Gradient Descent artifact and its place in the wider gallery work because they keep a machine-learning cliché small enough to inspect. Instead of talking about optimization in the abstract, the piece gives you Rosenbrock's narrow valley, Himmelblau's four minima, Rastrigin's periodic traps, and a visible trail of every update. That scale matters. Emilien Dupont's 2018 optimization visualizations made the same point in a different register: simple two-dimensional surfaces are enough to show why momentum can overshoot, why plain gradient descent zigzags, and why adaptive methods often look more decisive when the terrain gets awkward.

The artifact turns that lesson into a canvas habit. You click, the point lands, and each optimizer starts negotiating curvature instead of solving a clean equation. On Rosenbrock's function, first published in 1960 as a hard test case for hill-climbing methods, the path matters as much as the destination because the valley bends before it narrows. Kingma and Ba's Adam paper framed the optimizer as a first-order method built from adaptive estimates of lower-order moments; here that idea becomes legible as motion. Adam does not merely arrive faster. It appears to read the surface differently, taking shorter corrective steps where the walls are steep and steadier ones where the valley opens.

const surfaces={
  rosenbrock:{
    fn:(x,y)=>100*(y-x*x)**2+(1-x)**2,
    grad:(x,y)=>[
      -400*x*(y-x*x)+2*(x-1),
      200*(y-x*x)
    ],
    range:[-2,2,-1,3],
    start:[-1.5,2.5],
    logScale:true
  }
};

I like that the code stays close to the mathematics. The artifact does not hide the toy status of the problem or pretend a contour map is a neural network. It uses toy problems the way good teaching tools do: as places where the mechanism can stay visible long enough for a reader to notice it. The trail line, the point markers, and the choice of test surfaces make optimization look less like black-box intelligence and more like repeated local compromise under constraint.

That is also why this feels like a strong fit for creative coding. The piece is technically a visualizer, but it behaves like an argument about legibility. It says that part of the history of machine learning now lives in interface conventions: heat maps, contour lines, red trajectories, sliders for learning rate, a button that changes "Run" to "Pause." None of that is neutral. Each element helps turn invisible numerical adjustment into something a person can watch, compare, and mistrust a little more intelligently. Dupont's essay, Rosenbrock's benchmark, and the Adam paper all sit behind the artifact, but the artifact's real achievement is simpler. It lets you see optimization as a path through error, not a magic jump to an answer.