Automatic Differentiation (Autograd)

Automatic differentiation (autograd) is the engine behind neural network training. This example shows how parameter() creates GradTensors that record every operation into a DAG. When you call .backward() on a scalar loss, Deepbox traverses this graph in reverse to compute gradients for every parameter.

Deepbox Modules Used

deepbox/ndarray

What You Will Learn

parameter() creates GradTensors that track computation graphs
.backward() computes gradients via reverse-mode autodiff
Gradients follow the chain rule through all operations
noGrad() disables tracking — use for inference to save memory
.zeroGrad() resets accumulated gradients before each training step

Source Code

14-autograd/index.ts

1import { GradTensor, noGrad, parameter, tensor } from "deepbox/ndarray";23console.log("=== Automatic Differentiation ===\n");45// ---------------------------------------------------------------------------6// Part 1: Basic gradient computation7// ---------------------------------------------------------------------------8console.log("--- Part 1: Basic Gradients ---");910// f(x) = x^2  =>  df/dx = 2x11const x = parameter([2, 3, 4]);12const y = x.mul(x).sum();13y.backward();1415console.log("x       :", x.tensor.toString());16console.log("f(x)=x²  sum:", y.tensor.toString());17console.log("grad    :", x.grad?.toString() ?? "null");18// Expected gradients: [4, 6, 8]1920// ---------------------------------------------------------------------------21// Part 2: Multi-variable gradients22// ---------------------------------------------------------------------------23console.log("\n--- Part 2: Multi-Variable Gradients ---");2425const a = parameter([26  [1, 2],27  [3, 4],28]);29const w = parameter([[0.5], [0.5]]);3031// y = sum(a @ w)32const z = a.matmul(w).sum();33z.backward();3435console.log("a =", a.tensor.toString());36console.log("w =", w.tensor.toString());37console.log("z = sum(a @ w) =", z.tensor.toString());38console.log("dz/da =", a.grad?.toString() ?? "null");39console.log("dz/dw =", w.grad?.toString() ?? "null");4041// ---------------------------------------------------------------------------42// Part 3: Chained operations43// ---------------------------------------------------------------------------44console.log("\n--- Part 3: Chained Operations ---");4546const p = parameter([1, 2, 3, 4]);4748// f(p) = sum(relu(p * 2 - 3))49const scaled = p.mul(GradTensor.fromTensor(tensor([2, 2, 2, 2]), { requiresGrad: false }));50const shifted = scaled.sub(GradTensor.fromTensor(tensor([3, 3, 3, 3]), { requiresGrad: false }));51const activated = shifted.relu();52const loss = activated.sum();53loss.backward();5455console.log("p       :", p.tensor.toString());56console.log("2p - 3  :", shifted.tensor.toString());57console.log("relu    :", activated.tensor.toString());58console.log("grad    :", p.grad?.toString() ?? "null");5960// ---------------------------------------------------------------------------61// Part 4: noGrad for inference62// ---------------------------------------------------------------------------63console.log("\n--- Part 4: noGrad for Inference ---");6465const q = parameter([1, 2, 3]);66noGrad(() => {67  // Operations inside noGrad do not track gradients68  const result = q.mul(q);69  console.log("noGrad result:", result.tensor.toString());70  console.log("requiresGrad:", result.requiresGrad);71});7273// ---------------------------------------------------------------------------74// Part 5: Zero gradients and re-compute75// ---------------------------------------------------------------------------76console.log("\n--- Part 5: Gradient Accumulation ---");7778const v = parameter([1, 2, 3]);7980// First backward81const loss1 = v.mul(v).sum();82loss1.backward();83console.log("After first backward, grad:", v.grad?.toString() ?? "null");8485// Zero gradients before second pass86v.zeroGrad();87console.log("After zeroGrad, grad:", v.grad?.toString() ?? "null");8889// Second backward with different computation90const loss2 = v.mul(GradTensor.fromTensor(tensor([3, 3, 3]), { requiresGrad: false })).sum();91loss2.backward();92console.log("After second backward, grad:", v.grad?.toString() ?? "null");9394console.log("\n=== Autograd Complete ===");

Console Output

$ npx tsx 14-autograd/index.ts

=== Automatic Differentiation ===

--- Part 1: Basic Gradients ---
x       : tensor([2, 3, 4], dtype=float32)
f(x)=x²  sum: tensor(29, dtype=float32)
grad    : tensor([4, 6, 8], dtype=float32)

--- Part 2: Multi-Variable Gradients ---
a = tensor([[1, 2]
       [3, 4]], dtype=float32)
w = tensor([[0.5000]
       [0.5000]], dtype=float32)
z = sum(a @ w) = tensor(5, dtype=float32)
dz/da = tensor([[0.5000, 0.5000]
       [0.5000, 0.5000]], dtype=float32)
dz/dw = tensor([[4]
       [6]], dtype=float32)

--- Part 3: Chained Operations ---
p       : tensor([1, 2, 3, 4], dtype=float32)
2p - 3  : tensor([-1, 1, 3, 5], dtype=float32)
relu    : tensor([0, 1, 3, 5], dtype=float32)
grad    : tensor([0, 2, 2, 2], dtype=float32)

--- Part 4: noGrad for Inference ---
noGrad result: tensor([1, 4, 9], dtype=float32)
requiresGrad: false

--- Part 5: Gradient Accumulation ---
After first backward, grad: tensor([2, 4, 6], dtype=float32)
After zeroGrad, grad: tensor([0, 0, 0], dtype=float32)
After second backward, grad: tensor([3, 3, 3], dtype=float32)

=== Autograd Complete ===

Neural Network TrainingPrevious Activation FunctionsNext