17
Preprocessing
Encoders
Categorical Data
Preprocessing — Encoders
Machine learning models require numeric input, but real-world data often contains categorical variables. This example demonstrates all 5 Deepbox encoders: LabelEncoder, OneHotEncoder, OrdinalEncoder, MultiLabelBinarizer, and LabelBinarizer.
Deepbox Modules Used
deepbox/ndarraydeepbox/preprocessWhat You Will Learn
- LabelEncoder maps categories to integers — use for tree-based models
- OneHotEncoder creates binary columns — required for linear models and neural nets
- OrdinalEncoder preserves ordering for ordinal features (small < medium < large)
- All encoders support .inverseTransform() for decoding predictions back to labels
Source Code
17-preprocessing-encoders/index.ts
1import { reshape, tensor } from "deepbox/ndarray";2import {3 LabelBinarizer,4 LabelEncoder,5 MultiLabelBinarizer,6 OneHotEncoder,7 OrdinalEncoder,8} from "deepbox/preprocess";910console.log("=== Preprocessing: Encoders ===\n");1112// ---------------------------------------------------------------------------13// Part 1: LabelEncoder — map string labels to integers14// ---------------------------------------------------------------------------15console.log("--- Part 1: LabelEncoder ---");1617const le = new LabelEncoder();18le.fit(tensor(["cat", "dog", "bird", "cat", "bird"]));1920const encoded = le.transform(tensor(["bird", "cat", "dog"]));21console.log("Encoded:", encoded.toString());2223const decoded = le.inverseTransform(encoded);24console.log("Decoded:", decoded.toString());2526// ---------------------------------------------------------------------------27// Part 2: OneHotEncoder — one-hot vectors for categorical features28// ---------------------------------------------------------------------------29console.log("\n--- Part 2: OneHotEncoder ---");3031const ohe = new OneHotEncoder();32ohe.fit(reshape(tensor(["red", "green", "blue", "red", "blue"]), [5, 1]));3334const oneHot = ohe.transform(reshape(tensor(["red", "blue", "green"]), [3, 1]));35console.log("One-hot encoded shape:", oneHot.shape);36console.log("One-hot encoded:\n", oneHot.toString());3738// ---------------------------------------------------------------------------39// Part 3: OrdinalEncoder — ordinal integer encoding40// ---------------------------------------------------------------------------41console.log("\n--- Part 3: OrdinalEncoder ---");4243const oe = new OrdinalEncoder();44oe.fit(reshape(tensor(["low", "medium", "high", "medium", "low"]), [5, 1]));4546const ordinal = oe.transform(reshape(tensor(["low", "high", "medium"]), [3, 1]));47console.log("Ordinal encoded:", ordinal.toString());4849const ordinalDecoded = oe.inverseTransform(ordinal);50console.log("Decoded:", ordinalDecoded.toString());5152// ---------------------------------------------------------------------------53// Part 4: LabelBinarizer — binary indicator for multi-class labels54// ---------------------------------------------------------------------------55console.log("\n--- Part 4: LabelBinarizer ---");5657const lb = new LabelBinarizer();58lb.fit(tensor(["cat", "dog", "bird"]));5960const binarized = lb.transform(tensor(["cat", "bird", "dog"]));61console.log("Binarized shape:", binarized.shape);62console.log("Binarized:\n", binarized.toString());6364const binarizedDecoded = lb.inverseTransform(binarized);65console.log("Decoded:", binarizedDecoded.toString());6667// ---------------------------------------------------------------------------68// Part 5: MultiLabelBinarizer — multi-label binary encoding69// ---------------------------------------------------------------------------70console.log("\n--- Part 5: MultiLabelBinarizer ---");7172const mlb = new MultiLabelBinarizer();73mlb.fit([["cat", "dog"], ["bird"], ["cat", "bird", "dog"]]);7475const multiEncoded = mlb.transform([["cat", "bird"], ["dog"]]);76console.log("Multi-label encoded shape:", multiEncoded.shape);77console.log("Multi-label encoded:\n", multiEncoded.toString());7879console.log("\n=== Preprocessing: Encoders Complete ===");Console Output
$ npx tsx 17-preprocessing-encoders/index.ts
=== Preprocessing: Encoders ===
--- Part 1: LabelEncoder ---
Encoded: tensor([0, 1, 2], dtype=float64)
Decoded: tensor(["bird", "cat", "dog"], dtype=string)
--- Part 2: OneHotEncoder ---
One-hot encoded shape: [ 3, 3 ]
One-hot encoded:
tensor([[0, 0, 1]
[1, 0, 0]
[0, 1, 0]], dtype=float64)
--- Part 3: OrdinalEncoder ---
Ordinal encoded: tensor([[1]
[0]
[2]], dtype=float64)
Decoded: tensor([["low"]
["high"]
["medium"]], dtype=string)
--- Part 4: LabelBinarizer ---
Binarized shape: [ 3, 3 ]
Binarized:
tensor([[0, 1, 0]
[1, 0, 0]
[0, 0, 1]], dtype=float64)
Decoded: tensor(["cat", "bird", "dog"], dtype=string)
--- Part 5: MultiLabelBinarizer ---
Multi-label encoded shape: [ 2, 3 ]
Multi-label encoded:
tensor([[1, 1, 0]
[0, 0, 1]], dtype=float64)
=== Preprocessing: Encoders Complete ===