Model Evaluation Metrics

Choosing the right evaluation metric is as important as choosing the right model. This example covers Classification metrics (accuracy, precision, recall, F1, confusion matrix, ROC AUC), Regression metrics (MSE, RMSE, MAE, R²), and Clustering metrics (silhouette score, Calinski-Harabasz, Davies-Bouldin).

Deepbox Modules Used

deepbox/ndarraydeepbox/metrics

What You Will Learn

Accuracy is misleading for imbalanced data — use F1 or balanced accuracy
Precision measures false positive rate; Recall measures false negative rate
MSE penalizes large errors quadratically; MAE treats all errors equally
R² close to 1.0 means the model explains most variance in the data
Silhouette score evaluates clustering quality without ground truth labels

Source Code

24-metrics/index.ts

1import {2  // Classification metrics3  accuracy,4  confusionMatrix,5  f1Score,6  mae,7  mape,8  mse,9  precision,10  // Regression metrics11  r2Score,12  recall,13  rmse,14  // Clustering metrics15  silhouetteScore,16} from "deepbox/metrics";17import { tensor } from "deepbox/ndarray";1819console.log("=== Model Evaluation Metrics ===\n");2021// Classification Metrics22console.log("1. Classification Metrics:");23console.log("-".repeat(50));2425const y_true_class = tensor([1, 0, 1, 1, 0, 1, 0, 0, 1, 1]);26const y_pred_class = tensor([1, 0, 1, 0, 0, 1, 0, 1, 1, 1]);2728console.log("True labels:", y_true_class.toString());29console.log("Predictions:", `${y_pred_class.toString()}\n`);3031const acc = accuracy(y_true_class, y_pred_class);32const prec = precision(y_true_class, y_pred_class);33const rec = recall(y_true_class, y_pred_class);34const f1 = f1Score(y_true_class, y_pred_class);3536console.log(`Accuracy:  ${(Number(acc) * 100).toFixed(2)}%`);37console.log(`Precision: ${(Number(prec) * 100).toFixed(2)}%`);38console.log(`Recall:    ${(Number(rec) * 100).toFixed(2)}%`);39console.log(`F1-Score:  ${(Number(f1) * 100).toFixed(2)}%\n`);4041const cm = confusionMatrix(y_true_class, y_pred_class);42console.log("Confusion Matrix:");43console.log(cm.toString());44console.log("Format: [[TN, FP], [FN, TP]]\n");4546// Regression Metrics47console.log("2. Regression Metrics:");48console.log("-".repeat(50));4950const y_true_reg = tensor([3.0, -0.5, 2.0, 7.0, 4.2]);51const y_pred_reg = tensor([2.5, 0.0, 2.1, 7.8, 4.0]);5253console.log("True values:", y_true_reg.toString());54console.log("Predictions:", `${y_pred_reg.toString()}\n`);5556const r2 = r2Score(y_true_reg, y_pred_reg);57const mseVal = mse(y_true_reg, y_pred_reg);58const rmseVal = rmse(y_true_reg, y_pred_reg);59const maeVal = mae(y_true_reg, y_pred_reg);60const mapeVal = mape(y_true_reg, y_pred_reg);6162console.log(`R² Score: ${r2.toFixed(4)}`);63console.log(`MSE:      ${mseVal.toFixed(4)}`);64console.log(`RMSE:     ${rmseVal.toFixed(4)}`);65console.log(`MAE:      ${maeVal.toFixed(4)}`);66console.log(`MAPE:     ${(mapeVal * 100).toFixed(2)}%\n`);6768// Clustering Metrics69console.log("3. Clustering Metrics:");70console.log("-".repeat(50));7172const X_cluster = tensor([73  [1, 2],74  [1.5, 1.8],75  [5, 8],76  [8, 8],77  [1, 0.6],78  [9, 11],79]);80const labels = tensor([0, 0, 1, 1, 0, 1]);8182const silhouette = silhouetteScore(X_cluster, labels);83console.log(`Silhouette Score: ${silhouette.toFixed(4)}`);84console.log("Range: [-1, 1], higher is better");85console.log("Measures how similar points are to their own cluster\n");8687console.log("Metric Selection Guide:");88console.log("• Classification: Use F1-score for imbalanced data");89console.log("• Regression: R² for variance explained, MAE for interpretability");90console.log("• Clustering: Silhouette for cluster quality");9192console.log("\n✓ Metrics evaluation complete!");

Console Output

$ npx tsx 24-metrics/index.ts

=== Model Evaluation Metrics ===

1. Classification Metrics:
--------------------------------------------------
True labels: tensor([1, 0, 1, ..., 0, 1, 1], dtype=float32)
Predictions: tensor([1, 0, 1, ..., 1, 1, 1], dtype=float32)

Accuracy:  80.00%
Precision: 83.33%
Recall:    83.33%
F1-Score:  83.33%

Confusion Matrix:
tensor([[3, 1]
       [1, 5]], dtype=float32)
Format: [[TN, FP], [FN, TP]]

2. Regression Metrics:
--------------------------------------------------
True values: tensor([3, -0.5000, 2, 7, 4.200], dtype=float32)
Predictions: tensor([2.500, 0, 2.100, 7.800, 4], dtype=float32)

R² Score: 0.9611
MSE:      0.2380
RMSE:     0.4879
MAE:      0.4200
MAPE:     2757.14%

3. Clustering Metrics:
--------------------------------------------------
Silhouette Score: 0.7480
Range: [-1, 1], higher is better
Measures how similar points are to their own cluster

Metric Selection Guide:
• Classification: Use F1-score for imbalanced data
• Regression: R² for variance explained, MAE for interpretability
• Clustering: Silhouette for cluster quality

✓ Metrics evaluation complete!

Cross-Validation StrategiesPrevious Data VisualizationNext