24
Metrics
Evaluation
Classification
Regression
Model Evaluation Metrics
Choosing the right evaluation metric is as important as choosing the right model. This example covers Classification metrics (accuracy, precision, recall, F1, confusion matrix, ROC AUC), Regression metrics (MSE, RMSE, MAE, R²), and Clustering metrics (silhouette score, Calinski-Harabasz, Davies-Bouldin).
Deepbox Modules Used
deepbox/ndarraydeepbox/metricsWhat You Will Learn
- Accuracy is misleading for imbalanced data — use F1 or balanced accuracy
- Precision measures false positive rate; Recall measures false negative rate
- MSE penalizes large errors quadratically; MAE treats all errors equally
- R² close to 1.0 means the model explains most variance in the data
- Silhouette score evaluates clustering quality without ground truth labels
Source Code
24-metrics/index.ts
1import {2 // Classification metrics3 accuracy,4 confusionMatrix,5 f1Score,6 mae,7 mape,8 mse,9 precision,10 // Regression metrics11 r2Score,12 recall,13 rmse,14 // Clustering metrics15 silhouetteScore,16} from "deepbox/metrics";17import { tensor } from "deepbox/ndarray";1819console.log("=== Model Evaluation Metrics ===\n");2021// Classification Metrics22console.log("1. Classification Metrics:");23console.log("-".repeat(50));2425const y_true_class = tensor([1, 0, 1, 1, 0, 1, 0, 0, 1, 1]);26const y_pred_class = tensor([1, 0, 1, 0, 0, 1, 0, 1, 1, 1]);2728console.log("True labels:", y_true_class.toString());29console.log("Predictions:", `${y_pred_class.toString()}\n`);3031const acc = accuracy(y_true_class, y_pred_class);32const prec = precision(y_true_class, y_pred_class);33const rec = recall(y_true_class, y_pred_class);34const f1 = f1Score(y_true_class, y_pred_class);3536console.log(`Accuracy: ${(Number(acc) * 100).toFixed(2)}%`);37console.log(`Precision: ${(Number(prec) * 100).toFixed(2)}%`);38console.log(`Recall: ${(Number(rec) * 100).toFixed(2)}%`);39console.log(`F1-Score: ${(Number(f1) * 100).toFixed(2)}%\n`);4041const cm = confusionMatrix(y_true_class, y_pred_class);42console.log("Confusion Matrix:");43console.log(cm.toString());44console.log("Format: [[TN, FP], [FN, TP]]\n");4546// Regression Metrics47console.log("2. Regression Metrics:");48console.log("-".repeat(50));4950const y_true_reg = tensor([3.0, -0.5, 2.0, 7.0, 4.2]);51const y_pred_reg = tensor([2.5, 0.0, 2.1, 7.8, 4.0]);5253console.log("True values:", y_true_reg.toString());54console.log("Predictions:", `${y_pred_reg.toString()}\n`);5556const r2 = r2Score(y_true_reg, y_pred_reg);57const mseVal = mse(y_true_reg, y_pred_reg);58const rmseVal = rmse(y_true_reg, y_pred_reg);59const maeVal = mae(y_true_reg, y_pred_reg);60const mapeVal = mape(y_true_reg, y_pred_reg);6162console.log(`R² Score: ${r2.toFixed(4)}`);63console.log(`MSE: ${mseVal.toFixed(4)}`);64console.log(`RMSE: ${rmseVal.toFixed(4)}`);65console.log(`MAE: ${maeVal.toFixed(4)}`);66console.log(`MAPE: ${(mapeVal * 100).toFixed(2)}%\n`);6768// Clustering Metrics69console.log("3. Clustering Metrics:");70console.log("-".repeat(50));7172const X_cluster = tensor([73 [1, 2],74 [1.5, 1.8],75 [5, 8],76 [8, 8],77 [1, 0.6],78 [9, 11],79]);80const labels = tensor([0, 0, 1, 1, 0, 1]);8182const silhouette = silhouetteScore(X_cluster, labels);83console.log(`Silhouette Score: ${silhouette.toFixed(4)}`);84console.log("Range: [-1, 1], higher is better");85console.log("Measures how similar points are to their own cluster\n");8687console.log("Metric Selection Guide:");88console.log("• Classification: Use F1-score for imbalanced data");89console.log("• Regression: R² for variance explained, MAE for interpretability");90console.log("• Clustering: Silhouette for cluster quality");9192console.log("\n✓ Metrics evaluation complete!");Console Output
$ npx tsx 24-metrics/index.ts
=== Model Evaluation Metrics ===
1. Classification Metrics:
--------------------------------------------------
True labels: tensor([1, 0, 1, ..., 0, 1, 1], dtype=float32)
Predictions: tensor([1, 0, 1, ..., 1, 1, 1], dtype=float32)
Accuracy: 80.00%
Precision: 83.33%
Recall: 83.33%
F1-Score: 83.33%
Confusion Matrix:
tensor([[3, 1]
[1, 5]], dtype=float32)
Format: [[TN, FP], [FN, TP]]
2. Regression Metrics:
--------------------------------------------------
True values: tensor([3, -0.5000, 2, 7, 4.200], dtype=float32)
Predictions: tensor([2.500, 0, 2.100, 7.800, 4], dtype=float32)
R² Score: 0.9611
MSE: 0.2380
RMSE: 0.4879
MAE: 0.4200
MAPE: 2757.14%
3. Clustering Metrics:
--------------------------------------------------
Silhouette Score: 0.7480
Range: [-1, 1], higher is better
Measures how similar points are to their own cluster
Metric Selection Guide:
• Classification: Use F1-score for imbalanced data
• Regression: R² for variance explained, MAE for interpretability
• Clustering: Silhouette for cluster quality
✓ Metrics evaluation complete!