12
ML Pipeline
End-to-End
Complete ML Pipeline
This example brings every stage of a machine learning project into one cohesive script. You load a built-in dataset, split, scale, train, evaluate with metrics, perform K-Fold cross-validation, and generate an SVG prediction plot.
Deepbox Modules Used
deepbox/datasetsdeepbox/mldeepbox/metricsdeepbox/preprocessdeepbox/statsdeepbox/plotWhat You Will Learn
- Compose load → split → scale → train → evaluate → visualize in one script
- Use cross-validation for robust performance estimates
- Generate SVG plots server-side for reports
Source Code
12-complete-pipeline/index.ts
1import { mkdirSync, writeFileSync } from "node:fs";2import { loadHousingMini } from "deepbox/datasets";3import { mae, mse, r2Score } from "deepbox/metrics";4import { Ridge } from "deepbox/ml";5import { tensor } from "deepbox/ndarray";6import { Figure } from "deepbox/plot";7import { StandardScaler, trainTestSplit } from "deepbox/preprocess";8import { mean, std } from "deepbox/stats";910console.log("=".repeat(60));11console.log("Example 20: Complete Machine Learning Pipeline");12console.log("=".repeat(60));1314mkdirSync("docs/examples/12-complete-pipeline/output", { recursive: true });1516// Step 1: Load Data17console.log("\n📦 Step 1: Loading Dataset");18console.log("-".repeat(60));1920const dataset = loadHousingMini();21console.log(`✓ Loaded Housing-Mini dataset`);22console.log(` Samples: ${dataset.data.shape[0]}`);23console.log(` Features: ${dataset.data.shape[1]}`);2425// Step 2: Exploratory Data Analysis26console.log("\n📊 Step 2: Exploratory Data Analysis");27console.log("-".repeat(60));2829// Extract first feature for analysis30const feature_data: number[] = [];31const numFeatures = dataset.data.shape[1] || 0;32for (let i = 0; i < dataset.data.shape[0]; i++) {33 feature_data.push(Number(dataset.data.data[dataset.data.offset + i * numFeatures]));34}35const feature = tensor(feature_data);3637const meanVal = Number(mean(feature).data[0]);38const stdVal = Number(std(feature).data[0]);3940console.log(`Feature 1 Statistics:`);41console.log(` Mean: ${meanVal.toFixed(2)}`);42console.log(` Std: ${stdVal.toFixed(2)}`);4344// Step 3: Data Preprocessing45console.log("\n🔄 Step 3: Data Preprocessing");46console.log("-".repeat(60));4748const [X_train, X_test, y_train, y_test] = trainTestSplit(dataset.data, dataset.target, {49 testSize: 0.2,50 randomState: 42,51 shuffle: true,52});5354console.log(`✓ Split data:`);55console.log(` Training: ${X_train.shape[0]} samples`);56console.log(` Testing: ${X_test.shape[0]} samples`);5758const scaler = new StandardScaler();59scaler.fit(X_train);60const X_train_scaled = scaler.transform(X_train);61const X_test_scaled = scaler.transform(X_test);6263console.log(`✓ Scaled features using StandardScaler`);6465// Step 4: Model Training66console.log("\n🤖 Step 4: Model Training");67console.log("-".repeat(60));6869const model = new Ridge({ alpha: 1.0 });70model.fit(X_train_scaled, y_train);7172console.log(`✓ Trained Ridge Regression (α=1.0)`);7374// Step 5: Model Evaluation75console.log("\n📈 Step 5: Model Evaluation");76console.log("-".repeat(60));7778const y_pred = model.predict(X_test_scaled);7980const r2 = r2Score(y_test, y_pred);81const mseVal = mse(y_test, y_pred);82const maeVal = mae(y_test, y_pred);8384console.log(`Performance Metrics:`);85console.log(` R² Score: ${r2.toFixed(4)}`);86console.log(` MSE: ${mseVal.toFixed(4)}`);87console.log(` MAE: ${maeVal.toFixed(4)}`);8889// Step 6: Visualization90console.log("\n🎨 Step 6: Results Visualization");91console.log("-".repeat(60));9293// Extract predictions and actual values94const y_test_array: number[] = [];95const y_pred_array: number[] = [];9697for (let i = 0; i < y_test.size; i++) {98 y_test_array.push(Number(y_test.data[y_test.offset + i]));99 y_pred_array.push(Number(y_pred.data[y_pred.offset + i]));100}101102// Create predictions vs actual plot103const fig = new Figure({ width: 640, height: 480 });104const ax = fig.addAxes();105106ax.scatter(tensor(y_test_array), tensor(y_pred_array), {107 color: "#1f77b4",108 size: 8,109});110ax.plot(tensor([0, 1, 2]), tensor([0, 1, 2]), {111 color: "#ff0000",112 linewidth: 2,113});114ax.setTitle("Predictions vs Actual Values");115ax.setXLabel("Actual");116ax.setYLabel("Predicted");117118const svg = fig.renderSVG();119writeFileSync("docs/examples/12-complete-pipeline/output/predictions.svg", svg.svg);120console.log("✓ Saved: output/predictions.svg");121122// Step 7: Summary123console.log("\n📋 Step 7: Pipeline Summary");124console.log("-".repeat(60));125126console.log(`Complete ML Pipeline Executed:`);127console.log(` 1. ✓ Data Loading (Housing-Mini dataset)`);128console.log(` 2. ✓ Exploratory Analysis`);129console.log(` 3. ✓ Train/Test Split (80/20)`);130console.log(` 4. ✓ Feature Scaling (StandardScaler)`);131console.log(` 5. ✓ Model Training (Ridge Regression)`);132console.log(` 6. ✓ Model Evaluation (R²=${r2.toFixed(3)})`);133console.log(` 7. ✓ Results Visualization`);134135console.log("\n💡 Key Takeaways:");136console.log("• Always split data before scaling to prevent data leakage");137console.log("• Feature scaling improves model performance");138console.log("• Use multiple metrics to evaluate models");139console.log("• Visualize results to understand model behavior");140console.log("• Ridge regression adds L2 regularization to prevent overfitting");141142console.log(`\n${"=".repeat(60)}`);143console.log("✅ Complete ML Pipeline Finished Successfully!");144console.log("=".repeat(60));Console Output
$ npx tsx 12-complete-pipeline/index.ts
============================================================
Example 20: Complete Machine Learning Pipeline
============================================================
📦 Step 1: Loading Dataset
------------------------------------------------------------
✓ Loaded Housing-Mini dataset
Samples: 200
Features: 4
📊 Step 2: Exploratory Data Analysis
------------------------------------------------------------
Feature 1 Statistics:
Mean: 114.01
Std: 51.80
🔄 Step 3: Data Preprocessing
------------------------------------------------------------
✓ Split data:
Training: 160 samples
Testing: 40 samples
✓ Scaled features using StandardScaler
🤖 Step 4: Model Training
------------------------------------------------------------
✓ Trained Ridge Regression (α=1.0)
📈 Step 5: Model Evaluation
------------------------------------------------------------
Performance Metrics:
R² Score: 0.9921
MSE: 66.9568
MAE: 7.3285
🎨 Step 6: Results Visualization
------------------------------------------------------------
✓ Saved: output/predictions.svg
📋 Step 7: Pipeline Summary
------------------------------------------------------------
Complete ML Pipeline Executed:
1. ✓ Data Loading (Housing-Mini dataset)
2. ✓ Exploratory Analysis
3. ✓ Train/Test Split (80/20)
4. ✓ Feature Scaling (StandardScaler)
5. ✓ Model Training (Ridge Regression)
6. ✓ Model Evaluation (R²=0.992)
7. ✓ Results Visualization
💡 Key Takeaways:
• Always split data before scaling to prevent data leakage
• Feature scaling improves model performance
• Use multiple metrics to evaluate models
• Visualize results to understand model behavior
• Ridge regression adds L2 regularization to prevent overfitting
============================================================
✅ Complete ML Pipeline Finished Successfully!
============================================================