GitHub
deepbox/metrics

Classification Metrics

Evaluate classification model performance with threshold-dependent metrics (accuracy, precision, recall, F1), threshold-independent metrics (ROC AUC, average precision), and agreement metrics (Cohen's Kappa, MCC). All functions accept Tensor inputs. Multi-class averaging strategies: 'binary' (default, single class), 'macro' (unweighted class average), 'micro' (global TP/FP/FN), 'weighted' (class-size-weighted average).
accuracy
accuracy(yTrue: Tensor, yPred: Tensor): number

Fraction of correct predictions: (TP + TN) / total. Simple but can be misleading with imbalanced classes.

precision
precision(yTrue: Tensor, yPred: Tensor, average?: string): number

Fraction of positive predictions that are correct: TP / (TP + FP). High precision means few false positives.

Parameters:
average: 'binary' | 'macro' | 'micro' | 'weighted' - Averaging strategy for multi-class (default: 'binary')
recall
recall(yTrue: Tensor, yPred: Tensor, average?: string): number

Fraction of actual positives correctly identified: TP / (TP + FN). Also called sensitivity or true positive rate.

f1Score
f1Score(yTrue: Tensor, yPred: Tensor, average?: string): number

Harmonic mean of precision and recall: 2·P·R / (P + R). Balances precision and recall into a single metric.

fbetaScore
fbetaScore(yTrue: Tensor, yPred: Tensor, beta: number, average?: string): number

Weighted harmonic mean: (1 + β²)·P·R / (β²·P + R). β > 1 weights recall higher; β < 1 weights precision higher.

confusionMatrix
confusionMatrix(yTrue: Tensor, yPred: Tensor): Tensor

Compute the confusion matrix. Entry [i, j] is the count of samples with true label i predicted as j. Shape: [nClasses, nClasses].

classificationReport
classificationReport(yTrue: Tensor, yPred: Tensor): ClassificationReport

Per-class precision, recall, F1 and support, plus macro/weighted averages.

rocAucScore
rocAucScore(yTrue: Tensor, yScore: Tensor): number

Area Under the ROC Curve. Measures the ability to distinguish between classes across all thresholds. 1.0 = perfect, 0.5 = random.

rocCurve
rocCurve(yTrue: Tensor, yScore: Tensor): { fpr: Tensor; tpr: Tensor; thresholds: Tensor }

Compute the Receiver Operating Characteristic curve. Returns false positive rate, true positive rate, and thresholds.

precisionRecallCurve
precisionRecallCurve(yTrue: Tensor, yScore: Tensor): { precision: Tensor; recall: Tensor; thresholds: Tensor }

Compute the precision-recall curve across different thresholds.

averagePrecisionScore
averagePrecisionScore(yTrue: Tensor, yScore: Tensor): number

Area under the precision-recall curve. Summarizes the curve into a single number.

logLoss
logLoss(yTrue: Tensor, yProb: Tensor): number

Logarithmic loss (cross-entropy). Measures the quality of predicted probabilities. Lower is better.

matthewsCorrcoef
matthewsCorrcoef(yTrue: Tensor, yPred: Tensor): number

Matthews Correlation Coefficient. Balanced measure for binary classification even with imbalanced classes. Range: [−1, 1].

cohenKappaScore
cohenKappaScore(yTrue: Tensor, yPred: Tensor): number

Cohen's Kappa. Measures agreement between predicted and true labels, adjusted for chance agreement.

balancedAccuracyScore
balancedAccuracyScore(yTrue: Tensor, yPred: Tensor): number

Average of recall for each class. Handles class imbalance better than standard accuracy.

hammingLoss
hammingLoss(yTrue: Tensor, yPred: Tensor): number

Fraction of labels that are incorrectly predicted. Equal to 1 − accuracy for single-label classification.

jaccardScore
jaccardScore(yTrue: Tensor, yPred: Tensor, average?: string): number

Jaccard similarity score (intersection over union) for classification.

Accuracy

Accuracy = (TP + TN) / (TP + TN + FP + FN)

Where:

  • TP = True Positives
  • FP = False Positives

Precision

Precision = TP / (TP + FP)

Where:

  • FP = False alarms (predicted positive but actually negative)

Recall (Sensitivity)

Recall = TP / (TP + FN)

Where:

  • FN = Missed detections (actually positive but predicted negative)

F1 Score

F1 = 2 · Precision · Recall / (Precision + Recall)

Where:

  • F1 = Harmonic mean of precision and recall

Fβ Score

Fβ = (1 + β²) · P · R / (β² · P + R)

Where:

  • β = β > 1 favors recall, β < 1 favors precision

Log Loss (Cross-Entropy)

L = −(1/n) Σᵢ [yᵢ log(pᵢ) + (1−yᵢ) log(1−pᵢ)]

Where:

  • pᵢ = Predicted probability for sample i

Matthews Correlation Coefficient

MCC = (TP·TN − FP·FN) / √((TP+FP)(TP+FN)(TN+FP)(TN+FN))

Where:

  • MCC = Range [−1, 1], 0 = random, 1 = perfect

Jaccard Index (IoU)

J = TP / (TP + FP + FN)

Where:

  • J = Intersection over Union
classification-metrics.ts
import { accuracy, precision, recall, f1Score, confusionMatrix, rocAucScore, classificationReport } from "deepbox/metrics";import { tensor } from "deepbox/ndarray";const yTrue = tensor([0, 1, 1, 0, 1, 0, 1, 1]);const yPred = tensor([0, 1, 0, 0, 1, 1, 1, 1]);accuracy(yTrue, yPred);      // 0.75precision(yTrue, yPred);     // TP / (TP + FP)recall(yTrue, yPred);        // TP / (TP + FN)f1Score(yTrue, yPred);       // harmonic mean of P and Rconst cm = confusionMatrix(yTrue, yPred);// [[2, 1],   ← true 0: 2 correct, 1 misclassified//  [1, 4]]   ← true 1: 1 misclassified, 4 correct// ROC AUC (requires probability scores)const yScore = tensor([0.1, 0.9, 0.4, 0.2, 0.8, 0.6, 0.95, 0.7]);rocAucScore(yTrue, yScore);  // Area under ROC curve// Full reportconst report = classificationReport(yTrue, yPred);