16
Optimizers
Schedulers
Training
Learning Rate Schedulers
Learning rate scheduling is critical for training neural networks effectively. This example demonstrates all 8 Deepbox schedulers: StepLR, MultiStepLR, ExponentialLR, CosineAnnealingLR, LinearLR, ReduceLROnPlateau, OneCycleLR, and WarmupLR. Each scheduler is stepped through epochs and the LR trajectory is printed.
Deepbox Modules Used
deepbox/nndeepbox/optimWhat You Will Learn
- StepLR/MultiStepLR for simple milestone-based decay
- CosineAnnealingLR for smooth decay — popular in modern training
- OneCycleLR ramps up then down — often gives best results
- WarmupLR prevents early training instability with large learning rates
- Call scheduler.step() after each epoch (or step for OneCycleLR)
Source Code
16-lr-schedulers/index.ts
1import { Linear, ReLU, Sequential } from "deepbox/nn";2import {3 Adam,4 CosineAnnealingLR,5 ExponentialLR,6 LinearLR,7 MultiStepLR,8 OneCycleLR,9 ReduceLROnPlateau,10 StepLR,11 WarmupLR,12} from "deepbox/optim";1314console.log("=== Learning Rate Schedulers ===\n");1516// Create a small model and optimizer for demonstration17const createOptimizer = () => {18 const model = new Sequential(new Linear(4, 8), new ReLU(), new Linear(8, 1));19 return new Adam(model.parameters(), { lr: 0.1 });20};2122// ---------------------------------------------------------------------------23// Part 1: StepLR — decay every N steps24// ---------------------------------------------------------------------------25console.log("--- Part 1: StepLR ---");2627const opt1 = createOptimizer();28const stepLR = new StepLR(opt1, { stepSize: 3, gamma: 0.5 });2930for (let epoch = 0; epoch < 10; epoch++) {31 const lr = stepLR.getLastLr()[0] ?? 0;32 console.log(` Epoch ${epoch}: lr = ${lr.toFixed(6)}`);33 stepLR.step();34}3536// ---------------------------------------------------------------------------37// Part 2: MultiStepLR — decay at specific milestones38// ---------------------------------------------------------------------------39console.log("\n--- Part 2: MultiStepLR ---");4041const opt2 = createOptimizer();42const multiStepLR = new MultiStepLR(opt2, {43 milestones: [3, 6, 8],44 gamma: 0.5,45});4647for (let epoch = 0; epoch < 10; epoch++) {48 const lr = multiStepLR.getLastLr()[0] ?? 0;49 console.log(` Epoch ${epoch}: lr = ${lr.toFixed(6)}`);50 multiStepLR.step();51}5253// ---------------------------------------------------------------------------54// Part 3: ExponentialLR — exponential decay each epoch55// ---------------------------------------------------------------------------56console.log("\n--- Part 3: ExponentialLR ---");5758const opt3 = createOptimizer();59const expLR = new ExponentialLR(opt3, { gamma: 0.9 });6061for (let epoch = 0; epoch < 10; epoch++) {62 const lr = expLR.getLastLr()[0] ?? 0;63 console.log(` Epoch ${epoch}: lr = ${lr.toFixed(6)}`);64 expLR.step();65}6667// ---------------------------------------------------------------------------68// Part 4: CosineAnnealingLR — cosine annealing69// ---------------------------------------------------------------------------70console.log("\n--- Part 4: CosineAnnealingLR ---");7172const opt4 = createOptimizer();73const cosineLR = new CosineAnnealingLR(opt4, { T_max: 10, etaMin: 0.001 });7475for (let epoch = 0; epoch < 10; epoch++) {76 const lr = cosineLR.getLastLr()[0] ?? 0;77 console.log(` Epoch ${epoch}: lr = ${lr.toFixed(6)}`);78 cosineLR.step();79}8081// ---------------------------------------------------------------------------82// Part 5: LinearLR — linear warmup / decay83// ---------------------------------------------------------------------------84console.log("\n--- Part 5: LinearLR ---");8586const opt5 = createOptimizer();87const linearLR = new LinearLR(opt5, {88 startFactor: 0.1,89 endFactor: 1.0,90 totalIters: 5,91});9293for (let epoch = 0; epoch < 8; epoch++) {94 const lr = linearLR.getLastLr()[0] ?? 0;95 console.log(` Epoch ${epoch}: lr = ${lr.toFixed(6)}`);96 linearLR.step();97}9899// ---------------------------------------------------------------------------100// Part 6: ReduceLROnPlateau — reduce when metric stops improving101// ---------------------------------------------------------------------------102console.log("\n--- Part 6: ReduceLROnPlateau ---");103104const opt6 = createOptimizer();105const plateauLR = new ReduceLROnPlateau(opt6, { factor: 0.5, patience: 2 });106107// Simulate a training loop where loss plateaus108const fakeLosses = [1.0, 0.8, 0.6, 0.59, 0.58, 0.58, 0.58, 0.3, 0.29, 0.29];109for (let epoch = 0; epoch < fakeLosses.length; epoch++) {110 const loss = fakeLosses[epoch];111 plateauLR.step(loss);112 console.log(113 ` Epoch ${epoch}: loss = ${loss.toFixed(2)}, lr = ${plateauLR.getLastLr()[0]?.toFixed(6)}`114 );115}116117// ---------------------------------------------------------------------------118// Part 7: WarmupLR — linear warmup then constant119// ---------------------------------------------------------------------------120console.log("\n--- Part 7: WarmupLR ---");121122const opt7 = createOptimizer();123const warmupLR = new WarmupLR(opt7, null, { warmupEpochs: 5 });124125for (let epoch = 0; epoch < 8; epoch++) {126 const lr = warmupLR.getLastLr()[0] ?? 0;127 console.log(` Epoch ${epoch}: lr = ${lr.toFixed(6)}`);128 warmupLR.step();129}130131// ---------------------------------------------------------------------------132// Part 8: OneCycleLR — super-convergence schedule133// ---------------------------------------------------------------------------134console.log("\n--- Part 8: OneCycleLR ---");135136const opt8 = createOptimizer();137const oneCycleLR = new OneCycleLR(opt8, { maxLr: 0.1, totalSteps: 10 });138139for (let epoch = 0; epoch < 10; epoch++) {140 const lr = oneCycleLR.getLastLr()[0] ?? 0;141 console.log(` Epoch ${epoch}: lr = ${lr.toFixed(6)}`);142 oneCycleLR.step();143}144145console.log("\n=== Learning Rate Schedulers Complete ===");Console Output
$ npx tsx 16-lr-schedulers/index.ts
=== Learning Rate Schedulers ===
--- Part 1: StepLR ---
Epoch 0: lr = 0.100000
Epoch 1: lr = 0.100000
Epoch 2: lr = 0.100000
Epoch 3: lr = 0.100000
Epoch 4: lr = 0.050000
Epoch 5: lr = 0.050000
Epoch 6: lr = 0.050000
Epoch 7: lr = 0.025000
Epoch 8: lr = 0.025000
Epoch 9: lr = 0.025000
--- Part 2: MultiStepLR ---
Epoch 0: lr = 0.100000
Epoch 1: lr = 0.100000
Epoch 2: lr = 0.100000
Epoch 3: lr = 0.100000
Epoch 4: lr = 0.050000
Epoch 5: lr = 0.050000
Epoch 6: lr = 0.050000
Epoch 7: lr = 0.025000
Epoch 8: lr = 0.025000
Epoch 9: lr = 0.012500
--- Part 3: ExponentialLR ---
Epoch 0: lr = 0.100000
Epoch 1: lr = 0.100000
Epoch 2: lr = 0.090000
Epoch 3: lr = 0.081000
Epoch 4: lr = 0.072900
Epoch 5: lr = 0.065610
Epoch 6: lr = 0.059049
Epoch 7: lr = 0.053144
Epoch 8: lr = 0.047830
Epoch 9: lr = 0.043047
--- Part 4: CosineAnnealingLR ---
Epoch 0: lr = 0.100000
Epoch 1: lr = 0.100000
Epoch 2: lr = 0.097577
Epoch 3: lr = 0.090546
Epoch 4: lr = 0.079595
Epoch 5: lr = 0.065796
Epoch 6: lr = 0.050500
Epoch 7: lr = 0.035204
Epoch 8: lr = 0.021405
Epoch 9: lr = 0.010454
--- Part 5: LinearLR ---
Epoch 0: lr = 0.100000
Epoch 1: lr = 0.010000
Epoch 2: lr = 0.028000
Epoch 3: lr = 0.046000
Epoch 4: lr = 0.064000
Epoch 5: lr = 0.082000
Epoch 6: lr = 0.100000
Epoch 7: lr = 0.100000
--- Part 6: ReduceLROnPlateau ---
Epoch 0: loss = 1.00, lr = 0.100000
Epoch 1: loss = 0.80, lr = 0.100000
Epoch 2: loss = 0.60, lr = 0.100000
Epoch 3: loss = 0.59, lr = 0.100000
Epoch 4: loss = 0.58, lr = 0.100000
Epoch 5: loss = 0.58, lr = 0.100000
Epoch 6: loss = 0.58, lr = 0.100000
Epoch 7: loss = 0.30, lr = 0.100000
Epoch 8: loss = 0.29, lr = 0.100000
Epoch 9: loss = 0.29, lr = 0.100000
--- Part 7: WarmupLR ---
Epoch 0: lr = 0.100000
Epoch 1: lr = 0.020000
Epoch 2: lr = 0.040000
Epoch 3: lr = 0.060000
Epoch 4: lr = 0.080000
Epoch 5: lr = 0.100000
Epoch 6: lr = 0.100000
Epoch 7: lr = 0.100000
--- Part 8: OneCycleLR ---
Epoch 0: lr = 0.100000
Epoch 1: lr = 0.004000
Epoch 2: lr = 0.036000
Epoch 3: lr = 0.068000
Epoch 4: lr = 0.100000
Epoch 5: lr = 0.095049
Epoch 6: lr = 0.081176
Epoch 7: lr = 0.061130
Epoch 8: lr = 0.038880
Epoch 9: lr = 0.018834
=== Learning Rate Schedulers Complete ===