Activation Functions

Activation functions introduce non-linearity into neural networks. This example computes every activation function in Deepbox on the same input tensor so you can compare their outputs side-by-side, and generates an SVG overlay plot of all activations.

Deepbox Modules Used

deepbox/ndarraydeepbox/plot

What You Will Learn

ReLU is the default choice — simple and effective
Sigmoid for binary output, Softmax for multi-class probabilities
GELU/Mish/Swish are modern smooth alternatives to ReLU
ELU and LeakyReLU prevent 'dying ReLU' by allowing negative gradients
LogSoftmax is numerically stable for NLL loss computation

Source Code

15-activation-functions/index.ts

1import { mkdirSync, writeFileSync } from "node:fs";2import {3  elu,4  gelu,5  leakyRelu,6  linspace,7  mish,8  relu,9  sigmoid,10  softmax,11  softplus,12  swish,13  tensor,14} from "deepbox/ndarray";15import { Figure } from "deepbox/plot";1617console.log("=== Activation Functions ===\n");1819mkdirSync("docs/examples/15-activation-functions/output", { recursive: true });2021// Generate input range22const x = linspace(-5, 5, 100);2324console.log("1. ReLU (Rectified Linear Unit):");25console.log("-".repeat(50));26const relu_out = relu(x);27console.log("f(x) = max(0, x)");28console.log("Use: Hidden layers, fast computation");29console.log("Range: [0, ∞)\n");3031console.log("2. Sigmoid:");32console.log("-".repeat(50));33const sigmoid_out = sigmoid(x);34console.log("f(x) = 1 / (1 + e^(-x))");35console.log("Use: Binary classification output");36console.log("Range: (0, 1)\n");3738console.log("3. Softmax:");39console.log("-".repeat(50));40const sample = tensor([1.0, 2.0, 3.0, 4.0]);41const softmax_out = softmax(sample);42console.log("Input:", sample.toString());43console.log("Output:", softmax_out.toString());44console.log("Use: Multi-class classification output");45console.log("Properties: Outputs sum to 1.0\n");4647console.log("4. GELU (Gaussian Error Linear Unit):");48console.log("-".repeat(50));49const gelu_out = gelu(x);50console.log("f(x) = x * Φ(x), where Φ is CDF of normal distribution");51console.log("Use: Transformers, modern architectures");52console.log("Smoother than ReLU\n");5354console.log("5. Leaky ReLU:");55console.log("-".repeat(50));56leakyRelu(x, 0.01);57console.log("f(x) = max(αx, x), α = 0.01");58console.log("Use: Prevents dying ReLU problem");59console.log("Allows small negative values\n");6061console.log("6. ELU (Exponential Linear Unit):");62console.log("-".repeat(50));63elu(x, 1.0);64console.log("f(x) = x if x > 0, else α(e^x - 1)");65console.log("Use: Can produce negative outputs");66console.log("Smoother than ReLU\n");6768console.log("7. Mish:");69console.log("-".repeat(50));70mish(x);71console.log("f(x) = x * tanh(softplus(x))");72console.log("Use: State-of-the-art in some tasks");73console.log("Self-regularizing, smooth\n");7475console.log("8. Swish (SiLU):");76console.log("-".repeat(50));77swish(x);78console.log("f(x) = x * sigmoid(x)");79console.log("Use: Discovered by neural architecture search");80console.log("Non-monotonic, smooth\n");8182console.log("9. Softplus:");83console.log("-".repeat(50));84softplus(x);85console.log("f(x) = log(1 + e^x)");86console.log("Use: Smooth approximation of ReLU");87console.log("Always positive\n");8889// Visualize activations90console.log("Creating visualization...");91const fig = new Figure({ width: 800, height: 600 });92const ax = fig.addAxes();9394ax.plot(x, relu_out, { color: "#1f77b4", linewidth: 2 });95ax.plot(x, sigmoid_out, { color: "#ff7f0e", linewidth: 2 });96ax.plot(x, gelu_out, { color: "#2ca02c", linewidth: 2 });97ax.setTitle("Activation Functions Comparison");98ax.setXLabel("Input");99ax.setYLabel("Output");100101const svg = fig.renderSVG();102writeFileSync("docs/examples/15-activation-functions/output/activations.svg", svg.svg);103console.log("✓ Saved: output/activations.svg\n");104105console.log("Selection Guide:");106console.log("• ReLU: Default choice, fast and effective");107console.log("• Sigmoid: Binary classification output layer");108console.log("• Softmax: Multi-class classification output layer");109console.log("• GELU/Mish/Swish: Modern alternatives, often better performance");110console.log("• Leaky ReLU/ELU: When dying ReLU is a problem");111112console.log("\n✓ Activation functions complete!");

Console Output

$ npx tsx 15-activation-functions/index.ts

=== Activation Functions ===

1. ReLU (Rectified Linear Unit):
--------------------------------------------------
f(x) = max(0, x)
Use: Hidden layers, fast computation
Range: [0, ∞)

2. Sigmoid:
--------------------------------------------------
f(x) = 1 / (1 + e^(-x))
Use: Binary classification output
Range: (0, 1)

3. Softmax:
--------------------------------------------------
Input: tensor([1, 2, 3, 4], dtype=float32)
Output: tensor([0.03206, 0.08714, 0.2369, 0.6439], dtype=float64)
Use: Multi-class classification output
Properties: Outputs sum to 1.0

4. GELU (Gaussian Error Linear Unit):
--------------------------------------------------
f(x) = x * Φ(x), where Φ is CDF of normal distribution
Use: Transformers, modern architectures
Smoother than ReLU

5. Leaky ReLU:
--------------------------------------------------
f(x) = max(αx, x), α = 0.01
Use: Prevents dying ReLU problem
Allows small negative values

6. ELU (Exponential Linear Unit):
--------------------------------------------------
f(x) = x if x > 0, else α(e^x - 1)
Use: Can produce negative outputs
Smoother than ReLU

7. Mish:
--------------------------------------------------
f(x) = x * tanh(softplus(x))
Use: State-of-the-art in some tasks
Self-regularizing, smooth

8. Swish (SiLU):
--------------------------------------------------
f(x) = x * sigmoid(x)
Use: Discovered by neural architecture search
Non-monotonic, smooth

9. Softplus:
--------------------------------------------------
f(x) = log(1 + e^x)
Use: Smooth approximation of ReLU
Always positive

Creating visualization...
✓ Saved: output/activations.svg

Selection Guide:
• ReLU: Default choice, fast and effective
• Sigmoid: Binary classification output layer
• Softmax: Multi-class classification output layer
• GELU/Mish/Swish: Modern alternatives, often better performance
• Leaky ReLU/ELU: When dying ReLU is a problem

✓ Activation functions complete!

Automatic Differentiation (Autograd)Previous Learning Rate SchedulersNext