Example 17
beginner
17
Preprocessing
Encoders
Categorical Data

Preprocessing — Encoders

Machine learning models require numeric input, but real-world data often contains categorical variables. This example demonstrates all 5 Deepbox encoders: LabelEncoder, OneHotEncoder, OrdinalEncoder, MultiLabelBinarizer, and LabelBinarizer.

Deepbox Modules Used

deepbox/ndarraydeepbox/preprocess

What You Will Learn

  • LabelEncoder maps categories to integers — use for tree-based models
  • OneHotEncoder creates binary columns — required for linear models and neural nets
  • OrdinalEncoder preserves ordering for ordinal features (small < medium < large)
  • All encoders support .inverseTransform() for decoding predictions back to labels

Source Code

17-preprocessing-encoders/index.ts
1import { reshape, tensor } from "deepbox/ndarray";2import {3  LabelBinarizer,4  LabelEncoder,5  MultiLabelBinarizer,6  OneHotEncoder,7  OrdinalEncoder,8} from "deepbox/preprocess";910console.log("=== Preprocessing: Encoders ===\n");1112// ---------------------------------------------------------------------------13// Part 1: LabelEncoder — map string labels to integers14// ---------------------------------------------------------------------------15console.log("--- Part 1: LabelEncoder ---");1617const le = new LabelEncoder();18le.fit(tensor(["cat", "dog", "bird", "cat", "bird"]));1920const encoded = le.transform(tensor(["bird", "cat", "dog"]));21console.log("Encoded:", encoded.toString());2223const decoded = le.inverseTransform(encoded);24console.log("Decoded:", decoded.toString());2526// ---------------------------------------------------------------------------27// Part 2: OneHotEncoder — one-hot vectors for categorical features28// ---------------------------------------------------------------------------29console.log("\n--- Part 2: OneHotEncoder ---");3031const ohe = new OneHotEncoder();32ohe.fit(reshape(tensor(["red", "green", "blue", "red", "blue"]), [5, 1]));3334const oneHot = ohe.transform(reshape(tensor(["red", "blue", "green"]), [3, 1]));35console.log("One-hot encoded shape:", oneHot.shape);36console.log("One-hot encoded:\n", oneHot.toString());3738// ---------------------------------------------------------------------------39// Part 3: OrdinalEncoder — ordinal integer encoding40// ---------------------------------------------------------------------------41console.log("\n--- Part 3: OrdinalEncoder ---");4243const oe = new OrdinalEncoder();44oe.fit(reshape(tensor(["low", "medium", "high", "medium", "low"]), [5, 1]));4546const ordinal = oe.transform(reshape(tensor(["low", "high", "medium"]), [3, 1]));47console.log("Ordinal encoded:", ordinal.toString());4849const ordinalDecoded = oe.inverseTransform(ordinal);50console.log("Decoded:", ordinalDecoded.toString());5152// ---------------------------------------------------------------------------53// Part 4: LabelBinarizer — binary indicator for multi-class labels54// ---------------------------------------------------------------------------55console.log("\n--- Part 4: LabelBinarizer ---");5657const lb = new LabelBinarizer();58lb.fit(tensor(["cat", "dog", "bird"]));5960const binarized = lb.transform(tensor(["cat", "bird", "dog"]));61console.log("Binarized shape:", binarized.shape);62console.log("Binarized:\n", binarized.toString());6364const binarizedDecoded = lb.inverseTransform(binarized);65console.log("Decoded:", binarizedDecoded.toString());6667// ---------------------------------------------------------------------------68// Part 5: MultiLabelBinarizer — multi-label binary encoding69// ---------------------------------------------------------------------------70console.log("\n--- Part 5: MultiLabelBinarizer ---");7172const mlb = new MultiLabelBinarizer();73mlb.fit([["cat", "dog"], ["bird"], ["cat", "bird", "dog"]]);7475const multiEncoded = mlb.transform([["cat", "bird"], ["dog"]]);76console.log("Multi-label encoded shape:", multiEncoded.shape);77console.log("Multi-label encoded:\n", multiEncoded.toString());7879console.log("\n=== Preprocessing: Encoders Complete ===");

Console Output

$ npx tsx 17-preprocessing-encoders/index.ts
=== Preprocessing: Encoders ===

--- Part 1: LabelEncoder ---
Encoded: tensor([0, 1, 2], dtype=float64)
Decoded: tensor(["bird", "cat", "dog"], dtype=string)

--- Part 2: OneHotEncoder ---
One-hot encoded shape: [ 3, 3 ]
One-hot encoded:
 tensor([[0, 0, 1]
       [1, 0, 0]
       [0, 1, 0]], dtype=float64)

--- Part 3: OrdinalEncoder ---
Ordinal encoded: tensor([[1]
       [0]
       [2]], dtype=float64)
Decoded: tensor([["low"]
       ["high"]
       ["medium"]], dtype=string)

--- Part 4: LabelBinarizer ---
Binarized shape: [ 3, 3 ]
Binarized:
 tensor([[0, 1, 0]
       [1, 0, 0]
       [0, 0, 1]], dtype=float64)
Decoded: tensor(["cat", "bird", "dog"], dtype=string)

--- Part 5: MultiLabelBinarizer ---
Multi-label encoded shape: [ 2, 3 ]
Multi-label encoded:
 tensor([[1, 1, 0]
       [0, 0, 1]], dtype=float64)

=== Preprocessing: Encoders Complete ===