31
DataLoader
Batching
Training
DataLoader — Batching & Shuffling
Training on the entire dataset at once is often impractical. The DataLoader class splits your data into mini-batches for stochastic gradient descent. This example demonstrates batching, shuffling, dropLast, and inference without labels.
Deepbox Modules Used
deepbox/datasetsdeepbox/ndarrayWhat You Will Learn
- DataLoader splits data into mini-batches for SGD training
- Use shuffle: true to randomize order each epoch — prevents ordering bias
- dropLast: true discards the final batch if it's smaller than batchSize
- Iterate with for-of: each iteration yields [X_batch, y_batch] tensors
- Set batchSize based on memory — larger = smoother gradients, smaller = more noise
Source Code
31-dataloader/index.ts
1import { DataLoader } from "deepbox/datasets";2import { tensor } from "deepbox/ndarray";34console.log("=== DataLoader: Batching & Shuffling ===\n");56// ---------------------------------------------------------------------------7// Part 1: Basic batching8// ---------------------------------------------------------------------------9console.log("--- Part 1: Basic Batching ---");1011const X = tensor([12 [1, 2],13 [3, 4],14 [5, 6],15 [7, 8],16 [9, 10],17 [11, 12],18 [13, 14],19 [15, 16],20 [17, 18],21 [19, 20],22]);23const y = tensor([0, 1, 0, 1, 0, 1, 0, 1, 0, 1]);2425const loader = new DataLoader(X, y, { batchSize: 3 });26console.log(`Dataset size: ${X.shape[0]} samples`);27console.log(`Batch size: 3`);28console.log(`Expected batches: 4 (last batch has 1 sample)\n`);2930let batchIdx = 0;31for (const [xBatch, yBatch] of loader) {32 console.log(33 ` Batch ${batchIdx}: X shape [${xBatch.shape.join(", ")}], y shape [${yBatch.shape.join(", ")}]`34 );35 batchIdx++;36}3738// ---------------------------------------------------------------------------39// Part 2: Shuffling with deterministic seed40// ---------------------------------------------------------------------------41console.log("\n--- Part 2: Shuffled Iteration ---");4243const shuffledLoader = new DataLoader(X, y, {44 batchSize: 5,45 shuffle: true,46 seed: 42,47});48console.log("DataLoader(batchSize=5, shuffle=true, seed=42)");4950console.log("\nFirst iteration:");51for (const [xBatch, yBatch] of shuffledLoader) {52 console.log(` X first row: ${xBatch.toString().split("\n")[0]}, y: ${yBatch.toString()}`);53}5455console.log("\nSecond iteration (same seed = same order):");56for (const [xBatch, yBatch] of shuffledLoader) {57 console.log(` X first row: ${xBatch.toString().split("\n")[0]}, y: ${yBatch.toString()}`);58}5960// ---------------------------------------------------------------------------61// Part 3: dropLast — discard incomplete final batch62// ---------------------------------------------------------------------------63console.log("\n--- Part 3: Drop Last Batch ---");6465const dropLoader = new DataLoader(X, y, {66 batchSize: 3,67 dropLast: true,68});69console.log("DataLoader(batchSize=3, dropLast=true)");70console.log(`Dataset: ${X.shape[0]} samples, batch: 3, dropLast: true`);7172let dropBatchCount = 0;73for (const [xBatch] of dropLoader) {74 console.log(` Batch ${dropBatchCount}: shape [${xBatch.shape.join(", ")}]`);75 dropBatchCount++;76}77console.log(`Total batches: ${dropBatchCount} (incomplete last batch dropped)`);7879// ---------------------------------------------------------------------------80// Part 4: Inference without labels81// ---------------------------------------------------------------------------82console.log("\n--- Part 4: Inference Without Labels ---");8384const testLoader = new DataLoader(X, undefined, {85 batchSize: 4,86 shuffle: false,87});88console.log("DataLoader(X, undefined, { batchSize: 4 })");8990let testBatchIdx = 0;91for (const [xBatch] of testLoader) {92 console.log(` Batch ${testBatchIdx}: X shape [${xBatch.shape.join(", ")}]`);93 testBatchIdx++;94}9596console.log("\n=== DataLoader Complete ===");Console Output
$ npx tsx 31-dataloader/index.ts
=== DataLoader: Batching & Shuffling ===
--- Part 1: Basic Batching ---
Dataset size: 10 samples
Batch size: 3
Expected batches: 4 (last batch has 1 sample)
Batch 0: X shape [3, 2], y shape [3]
Batch 1: X shape [3, 2], y shape [3]
Batch 2: X shape [3, 2], y shape [3]
Batch 3: X shape [1, 2], y shape [1]
--- Part 2: Shuffled Iteration ---
DataLoader(batchSize=5, shuffle=true, seed=42)
First iteration:
X first row: tensor([[17, 18], y: tensor([0, 1, 1, 0, 1], dtype=float32)
X first row: tensor([[7, 8], y: tensor([1, 0, 1, 0, 0], dtype=float32)
Second iteration (same seed = same order):
X first row: tensor([[17, 18], y: tensor([0, 1, 1, 0, 1], dtype=float32)
X first row: tensor([[7, 8], y: tensor([1, 0, 1, 0, 0], dtype=float32)
--- Part 3: Drop Last Batch ---
DataLoader(batchSize=3, dropLast=true)
Dataset: 10 samples, batch: 3, dropLast: true
Batch 0: shape [3, 2]
Batch 1: shape [3, 2]
Batch 2: shape [3, 2]
Total batches: 3 (incomplete last batch dropped)
--- Part 4: Inference Without Labels ---
DataLoader(X, undefined, { batchSize: 4 })
Batch 0: X shape [4, 2]
Batch 1: X shape [4, 2]
Batch 2: X shape [2, 2]
=== DataLoader Complete ===