10
Clustering
KNN
PCA
Naive Bayes
Advanced ML Models
Beyond linear models, Deepbox provides a full suite of classical ML algorithms. This example demonstrates four distinct paradigms: KMeans clustering, K-Nearest Neighbors for classification and regression, PCA for dimensionality reduction, and Gaussian Naive Bayes for probabilistic classification.
Deepbox Modules Used
deepbox/mldeepbox/ndarraydeepbox/metricsdeepbox/preprocessWhat You Will Learn
- KMeans discovers k clusters by minimizing within-cluster variance (inertia)
- KNN classifies by majority vote of the k nearest training points
- PCA projects data onto directions of maximum variance
- GaussianNB applies Bayes' theorem assuming Gaussian feature distributions
Source Code
10-advanced-ml-models/index.ts
1import { isNumericTypedArray, isTypedArray } from "deepbox/core";2import { accuracy } from "deepbox/metrics";3import { GaussianNB, KMeans, KNeighborsClassifier, KNeighborsRegressor, PCA } from "deepbox/ml";4import { tensor } from "deepbox/ndarray";5import { trainTestSplit } from "deepbox/preprocess";67const expectNumericTypedArray = (8 value: unknown9): Float32Array | Float64Array | Int32Array | Uint8Array => {10 if (!isTypedArray(value) || !isNumericTypedArray(value)) {11 throw new Error("Expected numeric typed array");12 }13 return value;14};1516console.log("=".repeat(60));17console.log("Example 21: Advanced ML Models");18console.log("=".repeat(60));1920// ============================================================================21// Part 1: KMeans Clustering22// ============================================================================23console.log("\n📦 Part 1: KMeans Clustering");24console.log("-".repeat(60));2526const clusterData = tensor([27 [1, 2],28 [1.5, 1.8],29 [5, 8],30 [8, 8],31 [1, 0.6],32 [9, 11],33 [8, 2],34 [10, 2],35 [9, 3],36]);3738const kmeans = new KMeans({ nClusters: 3, randomState: 42 });39kmeans.fit(clusterData);4041const clusterLabels = kmeans.predict(clusterData);42console.log("Cluster labels:", clusterLabels.toString());43console.log("Cluster centers shape:", kmeans.clusterCenters.shape);44console.log("Inertia:", kmeans.inertia.toFixed(4));45console.log("Number of iterations:", kmeans.nIter);4647// ============================================================================48// Part 2: K-Nearest Neighbors Classification49// ============================================================================50console.log("\n📦 Part 2: K-Nearest Neighbors Classification");51console.log("-".repeat(60));5253const XClass = tensor([54 [0, 0],55 [1, 1],56 [2, 2],57 [3, 3],58 [4, 4],59 [5, 5],60 [6, 6],61 [7, 7],62]);63const yClass = tensor([0, 0, 0, 0, 1, 1, 1, 1]);6465const [XTrainKNN, XTestKNN, yTrainKNN, yTestKNN] = trainTestSplit(XClass, yClass, {66 testSize: 0.25,67 randomState: 42,68});6970const knnClassifier = new KNeighborsClassifier({ nNeighbors: 3 });71knnClassifier.fit(XTrainKNN, yTrainKNN);7273const yPredKNN = knnClassifier.predict(XTestKNN);74const knnAccuracy = accuracy(yTestKNN, yPredKNN);7576console.log("KNN Classifier trained with k=3");77console.log("Test accuracy:", `${(Number(knnAccuracy) * 100).toFixed(2)}%`);7879const probabilities = knnClassifier.predictProba(XTestKNN);80console.log("Prediction probabilities shape:", probabilities.shape);8182// ============================================================================83// Part 3: K-Nearest Neighbors Regression84// ============================================================================85console.log("\n📦 Part 3: K-Nearest Neighbors Regression");86console.log("-".repeat(60));8788const XReg = tensor([[0], [1], [2], [3], [4], [5]]);89const yReg = tensor([0, 1, 4, 9, 16, 25]);9091const knnRegressor = new KNeighborsRegressor({ nNeighbors: 2 });92knnRegressor.fit(XReg, yReg);9394const yPredReg = knnRegressor.predict(tensor([[2.5], [3.5]]));95console.log("Predictions for [2.5] and [3.5]:", yPredReg.toString());9697const r2Score = knnRegressor.score(XReg, yReg);98console.log("R² score:", r2Score.toFixed(4));99100// ============================================================================101// Part 4: PCA (Dimensionality Reduction)102// ============================================================================103console.log("\n📦 Part 4: PCA - Dimensionality Reduction");104console.log("-".repeat(60));105106const XPca = tensor([107 [2.5, 2.4, 1.1],108 [0.5, 0.7, 0.3],109 [2.2, 2.9, 1.5],110 [1.9, 2.2, 0.9],111 [3.1, 3.0, 1.8],112 [2.3, 2.7, 1.2],113 [2.0, 1.6, 0.8],114 [1.0, 1.1, 0.5],115 [1.5, 1.6, 0.7],116 [1.1, 0.9, 0.4],117]);118119const pca = new PCA({ nComponents: 2 });120pca.fit(XPca);121122const XTransformed = pca.transform(XPca);123console.log("Original shape:", XPca.shape);124console.log("Transformed shape:", XTransformed.shape);125console.log("Explained variance ratio:", pca.explainedVarianceRatio.toString());126127const varianceData = expectNumericTypedArray(pca.explainedVarianceRatio.data);128const totalVariance = Array.from(varianceData).reduce((a, b) => a + b, 0);129console.log("Total variance explained:", `${(totalVariance * 100).toFixed(2)}%`);130131// Reconstruct data132const XReconstructed = pca.inverseTransform(XTransformed);133console.log("Reconstructed shape:", XReconstructed.shape);134135// ============================================================================136// Part 5: Gaussian Naive Bayes137// ============================================================================138console.log("\n📦 Part 5: Gaussian Naive Bayes");139console.log("-".repeat(60));140141const XNB = tensor([142 [1, 2],143 [2, 3],144 [3, 4],145 [4, 5],146 [5, 6],147 [6, 7],148 [7, 8],149 [8, 9],150]);151const yNB = tensor([0, 0, 0, 0, 1, 1, 1, 1]);152153const [XTrainNB, XTestNB, yTrainNB, yTestNB] = trainTestSplit(XNB, yNB, {154 testSize: 0.25,155 randomState: 42,156});157158const nb = new GaussianNB();159nb.fit(XTrainNB, yTrainNB);160161const yPredNB = nb.predict(XTestNB);162const nbAccuracy = accuracy(yTestNB, yPredNB);163164console.log("Gaussian Naive Bayes trained");165console.log("Test accuracy:", `${(Number(nbAccuracy) * 100).toFixed(2)}%`);166167const nbProba = nb.predictProba(XTestNB);168console.log("Prediction probabilities shape:", nbProba.shape);169170// ============================================================================171// Summary172// ============================================================================173console.log("\n💡 Key Takeaways");174console.log("-".repeat(60));175console.log("• KMeans: Unsupervised clustering for grouping similar data points");176console.log("• KNN: Instance-based learning for classification and regression");177console.log("• PCA: Dimensionality reduction while preserving variance");178console.log("• Naive Bayes: Probabilistic classifier based on Bayes' theorem");179console.log("• All models follow the fit/predict/score API (fit/predict/score)");180181console.log("\n✅ Advanced ML Models Example Complete!");182console.log("=".repeat(60));Console Output
$ npx tsx 10-advanced-ml-models/index.ts
============================================================
Example 21: Advanced ML Models
============================================================
📦 Part 1: KMeans Clustering
------------------------------------------------------------
Cluster labels: tensor([2, 2, 1, ..., 0, 0, 0], dtype=int32)
Cluster centers shape: [ 3, 2 ]
Inertia: 18.6467
Number of iterations: 2
📦 Part 2: K-Nearest Neighbors Classification
------------------------------------------------------------
KNN Classifier trained with k=3
Test accuracy: 100.00%
Prediction probabilities shape: [ 2, 2 ]
📦 Part 3: K-Nearest Neighbors Regression
------------------------------------------------------------
Predictions for [2.5] and [3.5]: tensor([6.500, 12.50], dtype=float32)
R² score: 0.9126
📦 Part 4: PCA - Dimensionality Reduction
------------------------------------------------------------
Original shape: [ 10, 3 ]
Transformed shape: [ 10, 2 ]
Explained variance ratio: tensor([0.9602, 0.03214], dtype=float32)
Total variance explained: 99.23%
Reconstructed shape: [ 10, 3 ]
📦 Part 5: Gaussian Naive Bayes
------------------------------------------------------------
Gaussian Naive Bayes trained
Test accuracy: 100.00%
Prediction probabilities shape: [ 2, 2 ]
💡 Key Takeaways
------------------------------------------------------------
• KMeans: Unsupervised clustering for grouping similar data points
• KNN: Instance-based learning for classification and regression
• PCA: Dimensionality reduction while preserving variance
• Naive Bayes: Probabilistic classifier based on Bayes' theorem
• All models follow the fit/predict/score API (fit/predict/score)
✅ Advanced ML Models Example Complete!
============================================================