Learning Rate Schedulers

Learning rate scheduling is critical for training neural networks effectively. This example demonstrates all 8 Deepbox schedulers: StepLR, MultiStepLR, ExponentialLR, CosineAnnealingLR, LinearLR, ReduceLROnPlateau, OneCycleLR, and WarmupLR. Each scheduler is stepped through epochs and the LR trajectory is printed.

Deepbox Modules Used

deepbox/nndeepbox/optim

What You Will Learn

StepLR/MultiStepLR for simple milestone-based decay
CosineAnnealingLR for smooth decay — popular in modern training
OneCycleLR ramps up then down — often gives best results
WarmupLR prevents early training instability with large learning rates
Call scheduler.step() after each epoch (or step for OneCycleLR)

Source Code

16-lr-schedulers/index.ts

1import { Linear, ReLU, Sequential } from "deepbox/nn";2import {3  Adam,4  CosineAnnealingLR,5  ExponentialLR,6  LinearLR,7  MultiStepLR,8  OneCycleLR,9  ReduceLROnPlateau,10  StepLR,11  WarmupLR,12} from "deepbox/optim";1314console.log("=== Learning Rate Schedulers ===\n");1516// Create a small model and optimizer for demonstration17const createOptimizer = () => {18  const model = new Sequential(new Linear(4, 8), new ReLU(), new Linear(8, 1));19  return new Adam(model.parameters(), { lr: 0.1 });20};2122// ---------------------------------------------------------------------------23// Part 1: StepLR — decay every N steps24// ---------------------------------------------------------------------------25console.log("--- Part 1: StepLR ---");2627const opt1 = createOptimizer();28const stepLR = new StepLR(opt1, { stepSize: 3, gamma: 0.5 });2930for (let epoch = 0; epoch < 10; epoch++) {31  const lr = stepLR.getLastLr()[0] ?? 0;32  console.log(`  Epoch ${epoch}: lr = ${lr.toFixed(6)}`);33  stepLR.step();34}3536// ---------------------------------------------------------------------------37// Part 2: MultiStepLR — decay at specific milestones38// ---------------------------------------------------------------------------39console.log("\n--- Part 2: MultiStepLR ---");4041const opt2 = createOptimizer();42const multiStepLR = new MultiStepLR(opt2, {43  milestones: [3, 6, 8],44  gamma: 0.5,45});4647for (let epoch = 0; epoch < 10; epoch++) {48  const lr = multiStepLR.getLastLr()[0] ?? 0;49  console.log(`  Epoch ${epoch}: lr = ${lr.toFixed(6)}`);50  multiStepLR.step();51}5253// ---------------------------------------------------------------------------54// Part 3: ExponentialLR — exponential decay each epoch55// ---------------------------------------------------------------------------56console.log("\n--- Part 3: ExponentialLR ---");5758const opt3 = createOptimizer();59const expLR = new ExponentialLR(opt3, { gamma: 0.9 });6061for (let epoch = 0; epoch < 10; epoch++) {62  const lr = expLR.getLastLr()[0] ?? 0;63  console.log(`  Epoch ${epoch}: lr = ${lr.toFixed(6)}`);64  expLR.step();65}6667// ---------------------------------------------------------------------------68// Part 4: CosineAnnealingLR — cosine annealing69// ---------------------------------------------------------------------------70console.log("\n--- Part 4: CosineAnnealingLR ---");7172const opt4 = createOptimizer();73const cosineLR = new CosineAnnealingLR(opt4, { T_max: 10, etaMin: 0.001 });7475for (let epoch = 0; epoch < 10; epoch++) {76  const lr = cosineLR.getLastLr()[0] ?? 0;77  console.log(`  Epoch ${epoch}: lr = ${lr.toFixed(6)}`);78  cosineLR.step();79}8081// ---------------------------------------------------------------------------82// Part 5: LinearLR — linear warmup / decay83// ---------------------------------------------------------------------------84console.log("\n--- Part 5: LinearLR ---");8586const opt5 = createOptimizer();87const linearLR = new LinearLR(opt5, {88  startFactor: 0.1,89  endFactor: 1.0,90  totalIters: 5,91});9293for (let epoch = 0; epoch < 8; epoch++) {94  const lr = linearLR.getLastLr()[0] ?? 0;95  console.log(`  Epoch ${epoch}: lr = ${lr.toFixed(6)}`);96  linearLR.step();97}9899// ---------------------------------------------------------------------------100// Part 6: ReduceLROnPlateau — reduce when metric stops improving101// ---------------------------------------------------------------------------102console.log("\n--- Part 6: ReduceLROnPlateau ---");103104const opt6 = createOptimizer();105const plateauLR = new ReduceLROnPlateau(opt6, { factor: 0.5, patience: 2 });106107// Simulate a training loop where loss plateaus108const fakeLosses = [1.0, 0.8, 0.6, 0.59, 0.58, 0.58, 0.58, 0.3, 0.29, 0.29];109for (let epoch = 0; epoch < fakeLosses.length; epoch++) {110  const loss = fakeLosses[epoch];111  plateauLR.step(loss);112  console.log(113    `  Epoch ${epoch}: loss = ${loss.toFixed(2)}, lr = ${plateauLR.getLastLr()[0]?.toFixed(6)}`114  );115}116117// ---------------------------------------------------------------------------118// Part 7: WarmupLR — linear warmup then constant119// ---------------------------------------------------------------------------120console.log("\n--- Part 7: WarmupLR ---");121122const opt7 = createOptimizer();123const warmupLR = new WarmupLR(opt7, null, { warmupEpochs: 5 });124125for (let epoch = 0; epoch < 8; epoch++) {126  const lr = warmupLR.getLastLr()[0] ?? 0;127  console.log(`  Epoch ${epoch}: lr = ${lr.toFixed(6)}`);128  warmupLR.step();129}130131// ---------------------------------------------------------------------------132// Part 8: OneCycleLR — super-convergence schedule133// ---------------------------------------------------------------------------134console.log("\n--- Part 8: OneCycleLR ---");135136const opt8 = createOptimizer();137const oneCycleLR = new OneCycleLR(opt8, { maxLr: 0.1, totalSteps: 10 });138139for (let epoch = 0; epoch < 10; epoch++) {140  const lr = oneCycleLR.getLastLr()[0] ?? 0;141  console.log(`  Epoch ${epoch}: lr = ${lr.toFixed(6)}`);142  oneCycleLR.step();143}144145console.log("\n=== Learning Rate Schedulers Complete ===");

Console Output

$ npx tsx 16-lr-schedulers/index.ts

=== Learning Rate Schedulers ===

--- Part 1: StepLR ---
  Epoch 0: lr = 0.100000
  Epoch 1: lr = 0.100000
  Epoch 2: lr = 0.100000
  Epoch 3: lr = 0.100000
  Epoch 4: lr = 0.050000
  Epoch 5: lr = 0.050000
  Epoch 6: lr = 0.050000
  Epoch 7: lr = 0.025000
  Epoch 8: lr = 0.025000
  Epoch 9: lr = 0.025000

--- Part 2: MultiStepLR ---
  Epoch 0: lr = 0.100000
  Epoch 1: lr = 0.100000
  Epoch 2: lr = 0.100000
  Epoch 3: lr = 0.100000
  Epoch 4: lr = 0.050000
  Epoch 5: lr = 0.050000
  Epoch 6: lr = 0.050000
  Epoch 7: lr = 0.025000
  Epoch 8: lr = 0.025000
  Epoch 9: lr = 0.012500

--- Part 3: ExponentialLR ---
  Epoch 0: lr = 0.100000
  Epoch 1: lr = 0.100000
  Epoch 2: lr = 0.090000
  Epoch 3: lr = 0.081000
  Epoch 4: lr = 0.072900
  Epoch 5: lr = 0.065610
  Epoch 6: lr = 0.059049
  Epoch 7: lr = 0.053144
  Epoch 8: lr = 0.047830
  Epoch 9: lr = 0.043047

--- Part 4: CosineAnnealingLR ---
  Epoch 0: lr = 0.100000
  Epoch 1: lr = 0.100000
  Epoch 2: lr = 0.097577
  Epoch 3: lr = 0.090546
  Epoch 4: lr = 0.079595
  Epoch 5: lr = 0.065796
  Epoch 6: lr = 0.050500
  Epoch 7: lr = 0.035204
  Epoch 8: lr = 0.021405
  Epoch 9: lr = 0.010454

--- Part 5: LinearLR ---
  Epoch 0: lr = 0.100000
  Epoch 1: lr = 0.010000
  Epoch 2: lr = 0.028000
  Epoch 3: lr = 0.046000
  Epoch 4: lr = 0.064000
  Epoch 5: lr = 0.082000
  Epoch 6: lr = 0.100000
  Epoch 7: lr = 0.100000

--- Part 6: ReduceLROnPlateau ---
  Epoch 0: loss = 1.00, lr = 0.100000
  Epoch 1: loss = 0.80, lr = 0.100000
  Epoch 2: loss = 0.60, lr = 0.100000
  Epoch 3: loss = 0.59, lr = 0.100000
  Epoch 4: loss = 0.58, lr = 0.100000
  Epoch 5: loss = 0.58, lr = 0.100000
  Epoch 6: loss = 0.58, lr = 0.100000
  Epoch 7: loss = 0.30, lr = 0.100000
  Epoch 8: loss = 0.29, lr = 0.100000
  Epoch 9: loss = 0.29, lr = 0.100000

--- Part 7: WarmupLR ---
  Epoch 0: lr = 0.100000
  Epoch 1: lr = 0.020000
  Epoch 2: lr = 0.040000
  Epoch 3: lr = 0.060000
  Epoch 4: lr = 0.080000
  Epoch 5: lr = 0.100000
  Epoch 6: lr = 0.100000
  Epoch 7: lr = 0.100000

--- Part 8: OneCycleLR ---
  Epoch 0: lr = 0.100000
  Epoch 1: lr = 0.004000
  Epoch 2: lr = 0.036000
  Epoch 3: lr = 0.068000
  Epoch 4: lr = 0.100000
  Epoch 5: lr = 0.095049
  Epoch 6: lr = 0.081176
  Epoch 7: lr = 0.061130
  Epoch 8: lr = 0.038880
  Epoch 9: lr = 0.018834

=== Learning Rate Schedulers Complete ===

Activation FunctionsPrevious Preprocessing — EncodersNext