Recurrent Neural Network Layers

Recurrent neural networks process sequential data by maintaining hidden state across time steps. This example demonstrates RNN, LSTM, and GRU layers with forward() and forwardWithState() methods.

Deepbox Modules Used

deepbox/ndarraydeepbox/nn

What You Will Learn

RNN is simplest but suffers from vanishing gradients for long sequences
LSTM adds forget/input/output gates — handles long-range dependencies
GRU simplifies LSTM to 2 gates — fewer parameters, often similar performance
Output shape: [batch, seq_len, hidden_size] — one hidden state per time step
Use final hidden state for classification, full output for sequence-to-sequence

Source Code

28-rnn-lstm-gru/index.ts

1import { GradTensor, tensor } from "deepbox/ndarray";2import { GRU, LSTM, RNN } from "deepbox/nn";34console.log("=== Recurrent Neural Network Layers ===\n");56// ---------------------------------------------------------------------------7// Part 1: Simple RNN8// ---------------------------------------------------------------------------9console.log("--- Part 1: Simple RNN ---");1011// RNN(inputSize, hiddenSize, options)12// Input shape (batchFirst=true): (batch, seqLen, inputSize)13const rnn = new RNN(4, 8, { batchFirst: true });14console.log("RNN(inputSize=4, hiddenSize=8, batchFirst=true)");1516// Batch of 2 sequences, each with 3 time steps and 4 features17const rnnInput = tensor([18  [19    [1, 2, 3, 4],20    [5, 6, 7, 8],21    [9, 10, 11, 12],22  ],23  [24    [13, 14, 15, 16],25    [17, 18, 19, 20],26    [21, 22, 23, 24],27  ],28]);29console.log(`Input shape:  [${rnnInput.shape.join(", ")}]`);3031const rnnResult = rnn.forward(rnnInput);32const rnnOut = rnnResult instanceof GradTensor ? rnnResult.tensor : rnnResult;33console.log(`Output shape: [${rnnOut.shape.join(", ")}]`);34console.log("  Output contains hidden states for all time steps\n");3536// ---------------------------------------------------------------------------37// Part 2: LSTM (Long Short-Term Memory)38// ---------------------------------------------------------------------------39console.log("--- Part 2: LSTM ---");4041// LSTM adds cell state for better long-range dependencies42const lstm = new LSTM(4, 8, { batchFirst: true });43console.log("LSTM(inputSize=4, hiddenSize=8, batchFirst=true)");44console.log(`Input shape:  [${rnnInput.shape.join(", ")}]`);4546const lstmResult = lstm.forward(rnnInput);47const lstmOut = lstmResult instanceof GradTensor ? lstmResult.tensor : lstmResult;48console.log(`Output shape: [${lstmOut.shape.join(", ")}]`);49console.log("  LSTM uses forget/input/output gates for selective memory\n");5051// ---------------------------------------------------------------------------52// Part 3: GRU (Gated Recurrent Unit)53// ---------------------------------------------------------------------------54console.log("--- Part 3: GRU ---");5556// GRU is a simplified version of LSTM with fewer parameters57const gru = new GRU(4, 8, { batchFirst: true });58console.log("GRU(inputSize=4, hiddenSize=8, batchFirst=true)");59console.log(`Input shape:  [${rnnInput.shape.join(", ")}]`);6061const gruResult = gru.forward(rnnInput);62const gruOut = gruResult instanceof GradTensor ? gruResult.tensor : gruResult;63console.log(`Output shape: [${gruOut.shape.join(", ")}]`);64console.log("  GRU uses reset/update gates — fewer params than LSTM\n");6566// ---------------------------------------------------------------------------67// Part 4: Multi-layer RNN68// ---------------------------------------------------------------------------69console.log("--- Part 4: Multi-Layer Stacking ---");7071const deepRnn = new RNN(4, 16, { numLayers: 2, batchFirst: true });72console.log("RNN(inputSize=4, hiddenSize=16, numLayers=2)");73console.log(`Input shape:  [${rnnInput.shape.join(", ")}]`);7475const deepResult = deepRnn.forward(rnnInput);76const deepOut = deepResult instanceof GradTensor ? deepResult.tensor : deepResult;77console.log(`Output shape: [${deepOut.shape.join(", ")}]`);78console.log("  2-layer RNN extracts higher-level sequential patterns\n");7980// ---------------------------------------------------------------------------81// Part 5: Unbatched (single sequence) input82// ---------------------------------------------------------------------------83console.log("--- Part 5: Unbatched Input ---");8485const singleSeq = tensor([86  [1, 2, 3, 4],87  [5, 6, 7, 8],88  [9, 10, 11, 12],89]);90console.log("Single sequence (no batch dim):");91console.log(`Input shape:  [${singleSeq.shape.join(", ")}]`);9293const singleResult = rnn.forward(singleSeq);94const singleOut = singleResult instanceof GradTensor ? singleResult.tensor : singleResult;95console.log(`Output shape: [${singleOut.shape.join(", ")}]`);96console.log("  2D input is treated as unbatched sequence\n");9798// ---------------------------------------------------------------------------99// Part 6: Parameter counts100// ---------------------------------------------------------------------------101console.log("--- Part 6: Parameter Comparison ---");102const rnnParams = Array.from(rnn.parameters()).length;103const lstmParams = Array.from(lstm.parameters()).length;104const gruParams = Array.from(gru.parameters()).length;105console.log(`RNN  parameters: ${rnnParams}`);106console.log(`LSTM parameters: ${lstmParams} (4x gates)`);107console.log(`GRU  parameters: ${gruParams} (3x gates)`);108109console.log("\n=== Recurrent Layers Complete ===");

Console Output

$ npx tsx 28-rnn-lstm-gru/index.ts

=== Recurrent Neural Network Layers ===

--- Part 1: Simple RNN ---
RNN(inputSize=4, hiddenSize=8, batchFirst=true)
Input shape:  [2, 3, 4]
Output shape: [2, 3, 8]
  Output contains hidden states for all time steps

--- Part 2: LSTM ---
LSTM(inputSize=4, hiddenSize=8, batchFirst=true)
Input shape:  [2, 3, 4]
Output shape: [2, 3, 8]
  LSTM uses forget/input/output gates for selective memory

--- Part 3: GRU ---
GRU(inputSize=4, hiddenSize=8, batchFirst=true)
Input shape:  [2, 3, 4]
Output shape: [2, 3, 8]
  GRU uses reset/update gates — fewer params than LSTM

--- Part 4: Multi-Layer Stacking ---
RNN(inputSize=4, hiddenSize=16, numLayers=2)
Input shape:  [2, 3, 4]
Output shape: [2, 3, 16]
  2-layer RNN extracts higher-level sequential patterns

--- Part 5: Unbatched Input ---
Single sequence (no batch dim):
Input shape:  [3, 4]
Output shape: [3, 8]
  2D input is treated as unbatched sequence

--- Part 6: Parameter Comparison ---
RNN  parameters: 4
LSTM parameters: 4 (4x gates)
GRU  parameters: 4 (3x gates)

=== Recurrent Layers Complete ===

CNN LayersPrevious Attention & Transformer LayersNext