14
Autograd
Gradients
Backpropagation
Automatic Differentiation (Autograd)
Automatic differentiation (autograd) is the engine behind neural network training. This example shows how parameter() creates GradTensors that record every operation into a DAG. When you call .backward() on a scalar loss, Deepbox traverses this graph in reverse to compute gradients for every parameter.
Deepbox Modules Used
deepbox/ndarrayWhat You Will Learn
- parameter() creates GradTensors that track computation graphs
- .backward() computes gradients via reverse-mode autodiff
- Gradients follow the chain rule through all operations
- noGrad() disables tracking — use for inference to save memory
- .zeroGrad() resets accumulated gradients before each training step
Source Code
14-autograd/index.ts
1import { GradTensor, noGrad, parameter, tensor } from "deepbox/ndarray";23console.log("=== Automatic Differentiation ===\n");45// ---------------------------------------------------------------------------6// Part 1: Basic gradient computation7// ---------------------------------------------------------------------------8console.log("--- Part 1: Basic Gradients ---");910// f(x) = x^2 => df/dx = 2x11const x = parameter([2, 3, 4]);12const y = x.mul(x).sum();13y.backward();1415console.log("x :", x.tensor.toString());16console.log("f(x)=x² sum:", y.tensor.toString());17console.log("grad :", x.grad?.toString() ?? "null");18// Expected gradients: [4, 6, 8]1920// ---------------------------------------------------------------------------21// Part 2: Multi-variable gradients22// ---------------------------------------------------------------------------23console.log("\n--- Part 2: Multi-Variable Gradients ---");2425const a = parameter([26 [1, 2],27 [3, 4],28]);29const w = parameter([[0.5], [0.5]]);3031// y = sum(a @ w)32const z = a.matmul(w).sum();33z.backward();3435console.log("a =", a.tensor.toString());36console.log("w =", w.tensor.toString());37console.log("z = sum(a @ w) =", z.tensor.toString());38console.log("dz/da =", a.grad?.toString() ?? "null");39console.log("dz/dw =", w.grad?.toString() ?? "null");4041// ---------------------------------------------------------------------------42// Part 3: Chained operations43// ---------------------------------------------------------------------------44console.log("\n--- Part 3: Chained Operations ---");4546const p = parameter([1, 2, 3, 4]);4748// f(p) = sum(relu(p * 2 - 3))49const scaled = p.mul(GradTensor.fromTensor(tensor([2, 2, 2, 2]), { requiresGrad: false }));50const shifted = scaled.sub(GradTensor.fromTensor(tensor([3, 3, 3, 3]), { requiresGrad: false }));51const activated = shifted.relu();52const loss = activated.sum();53loss.backward();5455console.log("p :", p.tensor.toString());56console.log("2p - 3 :", shifted.tensor.toString());57console.log("relu :", activated.tensor.toString());58console.log("grad :", p.grad?.toString() ?? "null");5960// ---------------------------------------------------------------------------61// Part 4: noGrad for inference62// ---------------------------------------------------------------------------63console.log("\n--- Part 4: noGrad for Inference ---");6465const q = parameter([1, 2, 3]);66noGrad(() => {67 // Operations inside noGrad do not track gradients68 const result = q.mul(q);69 console.log("noGrad result:", result.tensor.toString());70 console.log("requiresGrad:", result.requiresGrad);71});7273// ---------------------------------------------------------------------------74// Part 5: Zero gradients and re-compute75// ---------------------------------------------------------------------------76console.log("\n--- Part 5: Gradient Accumulation ---");7778const v = parameter([1, 2, 3]);7980// First backward81const loss1 = v.mul(v).sum();82loss1.backward();83console.log("After first backward, grad:", v.grad?.toString() ?? "null");8485// Zero gradients before second pass86v.zeroGrad();87console.log("After zeroGrad, grad:", v.grad?.toString() ?? "null");8889// Second backward with different computation90const loss2 = v.mul(GradTensor.fromTensor(tensor([3, 3, 3]), { requiresGrad: false })).sum();91loss2.backward();92console.log("After second backward, grad:", v.grad?.toString() ?? "null");9394console.log("\n=== Autograd Complete ===");Console Output
$ npx tsx 14-autograd/index.ts
=== Automatic Differentiation ===
--- Part 1: Basic Gradients ---
x : tensor([2, 3, 4], dtype=float32)
f(x)=x² sum: tensor(29, dtype=float32)
grad : tensor([4, 6, 8], dtype=float32)
--- Part 2: Multi-Variable Gradients ---
a = tensor([[1, 2]
[3, 4]], dtype=float32)
w = tensor([[0.5000]
[0.5000]], dtype=float32)
z = sum(a @ w) = tensor(5, dtype=float32)
dz/da = tensor([[0.5000, 0.5000]
[0.5000, 0.5000]], dtype=float32)
dz/dw = tensor([[4]
[6]], dtype=float32)
--- Part 3: Chained Operations ---
p : tensor([1, 2, 3, 4], dtype=float32)
2p - 3 : tensor([-1, 1, 3, 5], dtype=float32)
relu : tensor([0, 1, 3, 5], dtype=float32)
grad : tensor([0, 2, 2, 2], dtype=float32)
--- Part 4: noGrad for Inference ---
noGrad result: tensor([1, 4, 9], dtype=float32)
requiresGrad: false
--- Part 5: Gradient Accumulation ---
After first backward, grad: tensor([2, 4, 6], dtype=float32)
After zeroGrad, grad: tensor([0, 0, 0], dtype=float32)
After second backward, grad: tensor([3, 3, 3], dtype=float32)
=== Autograd Complete ===