The mathematical foundation of classification accuracy metrics derives from probability theory and statistical inference, providing rigorous frameworks for performance evaluation and comparison.
Mathematical Foundations:
Accuracy: A = (TP + TN) / N, where N = TP + FP + TN + FN is the total sample size. This represents the probability of correct classification: P(Correct) = P(Positive ∩ Predicted Positive) + P(Negative ∩ Predicted Negative).
Precision: P = TP / (TP + FP) = P(Actual Positive | Predicted Positive). This conditional probability measures the reliability of positive predictions using Bayes' theorem framework.
Recall (Sensitivity): R = TP / (TP + FN) = P(Predicted Positive | Actual Positive). This measures the true positive rate and relates to statistical power in hypothesis testing.
Specificity: S = TN / (TN + FP) = P(Predicted Negative | Actual Negative). This measures the true negative rate and complements sensitivity in ROC analysis.
F1-Score: F1 = 2PR / (P + R) = 2TP / (2TP + FP + FN). This harmonic mean provides a single metric balancing precision and recall, particularly valuable for imbalanced datasets.
Advanced Statistical Relationships:
- ROC Analysis: True Positive Rate (Recall) vs. False Positive Rate (1-Specificity) provides comprehensive performance visualization across classification thresholds.
- Information Theory: Mutual information between predictions and actual classes quantifies classification performance using entropy measures.
- Confidence Intervals: For accuracy A with sample size n, the 95% confidence interval is approximately A ± 1.96√(A(1-A)/n), enabling statistical significance testing.
Multi-Class Extensions:
- Macro-averaged: Compute metrics for each class separately, then average: Precisionmacro = (1/k)Σ Precisioni for k classes.
- Micro-averaged: Aggregate all TP, FP, FN across classes, then compute: Precisionmicro = Σ TPi / (Σ TPi + Σ FPi).
- Weighted Metrics: Weight each class by its frequency to handle class imbalance: Precisionweighted = Σ (ni/n) × Precisioni where ni is the number of samples in class i.