deepbox/preprocess

Scalers

Transform features to a common scale. All scalers follow the fit/transform pattern: fit() learns parameters from training data, transform() applies the transformation. Use fitTransform() as a shortcut.

StandardScaler

Standardize features by removing the mean and scaling to unit variance: z = (x − μ) / σ. Each feature is independently transformed. The most common scaler for models that assume normally distributed input (SVM, logistic regression, neural networks).

MinMaxScaler

Scale features to a given range [min, max] (default [0, 1]): x' = (x − x_min) / (x_max − x_min). Preserves the shape of the original distribution. Sensitive to outliers.

RobustScaler

Scale features using statistics robust to outliers: x' = (x − median) / IQR. Uses median and interquartile range instead of mean and standard deviation.

MaxAbsScaler

Scale each feature by its maximum absolute value: x' = x / max(|x|). Scales to [−1, 1] range. Does not shift/center the data (preserves sparsity).

PowerTransformer

Apply a power transform (Yeo-Johnson or Box-Cox) to make data more Gaussian-like. Useful when features have skewed distributions.

QuantileTransformer

Transform features to follow a uniform or normal distribution using quantile information. Robust to outliers but can distort correlations between features.

Normalizer

Normalize samples (rows) to unit norm: x' = x / ‖x‖. Supports L1, L2, and max norms. Operates on rows, not columns.

Scaler API (Common Methods)

.fit(X: Tensor) — Learn scaling parameters from training data
.transform(X: Tensor) — Apply learned transformation
.fitTransform(X: Tensor) — fit() + transform() in one call
.inverseTransform(X: Tensor) — Reverse the transformation

Standard

z = (x − μ) / σ

Where:

μ = Feature mean
σ = Feature std

MinMax

x' = (x − x_min) / (x_max − x_min)

Where:

x_min, x_max = Feature min/max from training data

Robust

x' = (x − median) / IQR

Where:

IQR = Q3 − Q1 (interquartile range)

scalers.ts

import { StandardScaler, MinMaxScaler, RobustScaler } from "deepbox/preprocess";import { tensor } from "deepbox/ndarray";const X = tensor([[1, 2], [3, 4], [5, 6], [7, 8]]);// StandardScaler: mean=0, std=1const ss = new StandardScaler();ss.fit(X);const XStd = ss.transform(X);       // Standardizedconst XOrig = ss.inverseTransform(XStd); // Back to original// MinMaxScaler: scale to [0, 1]const mms = new MinMaxScaler();const XNorm = mms.fitTransform(X);   // fit + transform in one call// RobustScaler: robust to outliersconst rs = new RobustScaler();const XRobust = rs.fitTransform(X);

Choosing a Scaler

StandardScaler — Default choice for most models. Assumes roughly Gaussian features.
MinMaxScaler — When you need bounded output (e.g., [0, 1] for neural networks).
RobustScaler — When data contains outliers that would distort mean/std.
MaxAbsScaler — When you want to preserve sparsity (zero values remain zero).
PowerTransformer — When features are skewed and you need Gaussian-like input.
QuantileTransformer — When you need a specific output distribution regardless of input.

Learning Rate Schedulers

Encoders