Example 14
intermediate
14
Autograd
Gradients
Backpropagation

Automatic Differentiation (Autograd)

Automatic differentiation (autograd) is the engine behind neural network training. This example shows how parameter() creates GradTensors that record every operation into a DAG. When you call .backward() on a scalar loss, Deepbox traverses this graph in reverse to compute gradients for every parameter.

Deepbox Modules Used

deepbox/ndarray

What You Will Learn

  • parameter() creates GradTensors that track computation graphs
  • .backward() computes gradients via reverse-mode autodiff
  • Gradients follow the chain rule through all operations
  • noGrad() disables tracking — use for inference to save memory
  • .zeroGrad() resets accumulated gradients before each training step

Source Code

14-autograd/index.ts
1import { GradTensor, noGrad, parameter, tensor } from "deepbox/ndarray";23console.log("=== Automatic Differentiation ===\n");45// ---------------------------------------------------------------------------6// Part 1: Basic gradient computation7// ---------------------------------------------------------------------------8console.log("--- Part 1: Basic Gradients ---");910// f(x) = x^2  =>  df/dx = 2x11const x = parameter([2, 3, 4]);12const y = x.mul(x).sum();13y.backward();1415console.log("x       :", x.tensor.toString());16console.log("f(x)=x²  sum:", y.tensor.toString());17console.log("grad    :", x.grad?.toString() ?? "null");18// Expected gradients: [4, 6, 8]1920// ---------------------------------------------------------------------------21// Part 2: Multi-variable gradients22// ---------------------------------------------------------------------------23console.log("\n--- Part 2: Multi-Variable Gradients ---");2425const a = parameter([26  [1, 2],27  [3, 4],28]);29const w = parameter([[0.5], [0.5]]);3031// y = sum(a @ w)32const z = a.matmul(w).sum();33z.backward();3435console.log("a =", a.tensor.toString());36console.log("w =", w.tensor.toString());37console.log("z = sum(a @ w) =", z.tensor.toString());38console.log("dz/da =", a.grad?.toString() ?? "null");39console.log("dz/dw =", w.grad?.toString() ?? "null");4041// ---------------------------------------------------------------------------42// Part 3: Chained operations43// ---------------------------------------------------------------------------44console.log("\n--- Part 3: Chained Operations ---");4546const p = parameter([1, 2, 3, 4]);4748// f(p) = sum(relu(p * 2 - 3))49const scaled = p.mul(GradTensor.fromTensor(tensor([2, 2, 2, 2]), { requiresGrad: false }));50const shifted = scaled.sub(GradTensor.fromTensor(tensor([3, 3, 3, 3]), { requiresGrad: false }));51const activated = shifted.relu();52const loss = activated.sum();53loss.backward();5455console.log("p       :", p.tensor.toString());56console.log("2p - 3  :", shifted.tensor.toString());57console.log("relu    :", activated.tensor.toString());58console.log("grad    :", p.grad?.toString() ?? "null");5960// ---------------------------------------------------------------------------61// Part 4: noGrad for inference62// ---------------------------------------------------------------------------63console.log("\n--- Part 4: noGrad for Inference ---");6465const q = parameter([1, 2, 3]);66noGrad(() => {67  // Operations inside noGrad do not track gradients68  const result = q.mul(q);69  console.log("noGrad result:", result.tensor.toString());70  console.log("requiresGrad:", result.requiresGrad);71});7273// ---------------------------------------------------------------------------74// Part 5: Zero gradients and re-compute75// ---------------------------------------------------------------------------76console.log("\n--- Part 5: Gradient Accumulation ---");7778const v = parameter([1, 2, 3]);7980// First backward81const loss1 = v.mul(v).sum();82loss1.backward();83console.log("After first backward, grad:", v.grad?.toString() ?? "null");8485// Zero gradients before second pass86v.zeroGrad();87console.log("After zeroGrad, grad:", v.grad?.toString() ?? "null");8889// Second backward with different computation90const loss2 = v.mul(GradTensor.fromTensor(tensor([3, 3, 3]), { requiresGrad: false })).sum();91loss2.backward();92console.log("After second backward, grad:", v.grad?.toString() ?? "null");9394console.log("\n=== Autograd Complete ===");

Console Output

$ npx tsx 14-autograd/index.ts
=== Automatic Differentiation ===

--- Part 1: Basic Gradients ---
x       : tensor([2, 3, 4], dtype=float32)
f(x)=x²  sum: tensor(29, dtype=float32)
grad    : tensor([4, 6, 8], dtype=float32)

--- Part 2: Multi-Variable Gradients ---
a = tensor([[1, 2]
       [3, 4]], dtype=float32)
w = tensor([[0.5000]
       [0.5000]], dtype=float32)
z = sum(a @ w) = tensor(5, dtype=float32)
dz/da = tensor([[0.5000, 0.5000]
       [0.5000, 0.5000]], dtype=float32)
dz/dw = tensor([[4]
       [6]], dtype=float32)

--- Part 3: Chained Operations ---
p       : tensor([1, 2, 3, 4], dtype=float32)
2p - 3  : tensor([-1, 1, 3, 5], dtype=float32)
relu    : tensor([0, 1, 3, 5], dtype=float32)
grad    : tensor([0, 2, 2, 2], dtype=float32)

--- Part 4: noGrad for Inference ---
noGrad result: tensor([1, 4, 9], dtype=float32)
requiresGrad: false

--- Part 5: Gradient Accumulation ---
After first backward, grad: tensor([2, 4, 6], dtype=float32)
After zeroGrad, grad: tensor([0, 0, 0], dtype=float32)
After second backward, grad: tensor([3, 3, 3], dtype=float32)

=== Autograd Complete ===