DataLoader — Batching & Shuffling

Training on the entire dataset at once is often impractical. The DataLoader class splits your data into mini-batches for stochastic gradient descent. This example demonstrates batching, shuffling, dropLast, and inference without labels.

Deepbox Modules Used

deepbox/datasetsdeepbox/ndarray

What You Will Learn

DataLoader splits data into mini-batches for SGD training
Use shuffle: true to randomize order each epoch — prevents ordering bias
dropLast: true discards the final batch if it's smaller than batchSize
Iterate with for-of: each iteration yields [X_batch, y_batch] tensors
Set batchSize based on memory — larger = smoother gradients, smaller = more noise

Source Code

31-dataloader/index.ts

1import { DataLoader } from "deepbox/datasets";2import { tensor } from "deepbox/ndarray";34console.log("=== DataLoader: Batching & Shuffling ===\n");56// ---------------------------------------------------------------------------7// Part 1: Basic batching8// ---------------------------------------------------------------------------9console.log("--- Part 1: Basic Batching ---");1011const X = tensor([12  [1, 2],13  [3, 4],14  [5, 6],15  [7, 8],16  [9, 10],17  [11, 12],18  [13, 14],19  [15, 16],20  [17, 18],21  [19, 20],22]);23const y = tensor([0, 1, 0, 1, 0, 1, 0, 1, 0, 1]);2425const loader = new DataLoader(X, y, { batchSize: 3 });26console.log(`Dataset size: ${X.shape[0]} samples`);27console.log(`Batch size: 3`);28console.log(`Expected batches: 4 (last batch has 1 sample)\n`);2930let batchIdx = 0;31for (const [xBatch, yBatch] of loader) {32  console.log(33    `  Batch ${batchIdx}: X shape [${xBatch.shape.join(", ")}], y shape [${yBatch.shape.join(", ")}]`34  );35  batchIdx++;36}3738// ---------------------------------------------------------------------------39// Part 2: Shuffling with deterministic seed40// ---------------------------------------------------------------------------41console.log("\n--- Part 2: Shuffled Iteration ---");4243const shuffledLoader = new DataLoader(X, y, {44  batchSize: 5,45  shuffle: true,46  seed: 42,47});48console.log("DataLoader(batchSize=5, shuffle=true, seed=42)");4950console.log("\nFirst iteration:");51for (const [xBatch, yBatch] of shuffledLoader) {52  console.log(`  X first row: ${xBatch.toString().split("\n")[0]}, y: ${yBatch.toString()}`);53}5455console.log("\nSecond iteration (same seed = same order):");56for (const [xBatch, yBatch] of shuffledLoader) {57  console.log(`  X first row: ${xBatch.toString().split("\n")[0]}, y: ${yBatch.toString()}`);58}5960// ---------------------------------------------------------------------------61// Part 3: dropLast — discard incomplete final batch62// ---------------------------------------------------------------------------63console.log("\n--- Part 3: Drop Last Batch ---");6465const dropLoader = new DataLoader(X, y, {66  batchSize: 3,67  dropLast: true,68});69console.log("DataLoader(batchSize=3, dropLast=true)");70console.log(`Dataset: ${X.shape[0]} samples, batch: 3, dropLast: true`);7172let dropBatchCount = 0;73for (const [xBatch] of dropLoader) {74  console.log(`  Batch ${dropBatchCount}: shape [${xBatch.shape.join(", ")}]`);75  dropBatchCount++;76}77console.log(`Total batches: ${dropBatchCount} (incomplete last batch dropped)`);7879// ---------------------------------------------------------------------------80// Part 4: Inference without labels81// ---------------------------------------------------------------------------82console.log("\n--- Part 4: Inference Without Labels ---");8384const testLoader = new DataLoader(X, undefined, {85  batchSize: 4,86  shuffle: false,87});88console.log("DataLoader(X, undefined, { batchSize: 4 })");8990let testBatchIdx = 0;91for (const [xBatch] of testLoader) {92  console.log(`  Batch ${testBatchIdx}: X shape [${xBatch.shape.join(", ")}]`);93  testBatchIdx++;94}9596console.log("\n=== DataLoader Complete ===");

Console Output

$ npx tsx 31-dataloader/index.ts

=== DataLoader: Batching & Shuffling ===

--- Part 1: Basic Batching ---
Dataset size: 10 samples
Batch size: 3
Expected batches: 4 (last batch has 1 sample)

  Batch 0: X shape [3, 2], y shape [3]
  Batch 1: X shape [3, 2], y shape [3]
  Batch 2: X shape [3, 2], y shape [3]
  Batch 3: X shape [1, 2], y shape [1]

--- Part 2: Shuffled Iteration ---
DataLoader(batchSize=5, shuffle=true, seed=42)

First iteration:
  X first row: tensor([[17, 18], y: tensor([0, 1, 1, 0, 1], dtype=float32)
  X first row: tensor([[7, 8], y: tensor([1, 0, 1, 0, 0], dtype=float32)

Second iteration (same seed = same order):
  X first row: tensor([[17, 18], y: tensor([0, 1, 1, 0, 1], dtype=float32)
  X first row: tensor([[7, 8], y: tensor([1, 0, 1, 0, 0], dtype=float32)

--- Part 3: Drop Last Batch ---
DataLoader(batchSize=3, dropLast=true)
Dataset: 10 samples, batch: 3, dropLast: true
  Batch 0: shape [3, 2]
  Batch 1: shape [3, 2]
  Batch 2: shape [3, 2]
Total batches: 3 (incomplete last batch dropped)

--- Part 4: Inference Without Labels ---
DataLoader(X, undefined, { batchSize: 4 })
  Batch 0: X shape [4, 2]
  Batch 1: X shape [4, 2]
  Batch 2: X shape [2, 2]

=== DataLoader Complete ===

Normalization & Dropout LayersPrevious Neural Network Module SystemNext