Matthews Correlation Coefficient (MCC) Calculator

Enter the values from your confusion matrix to evaluate model performance.

Provide the True Positive (TP), False Positive (FP), True Negative (TN), and False Negative (FN) values from your model's confusion matrix. The calculator will compute the MCC and other key performance metrics.

Practical Examples

Explore different scenarios to understand how MCC works.

Balanced High-Performing Model

balanced

A scenario where the model performs well on a balanced dataset.

TP: 90, FP: 10

TN: 85, FN: 15

Imbalanced Dataset

imbalanced

An example with an imbalanced dataset, where MCC is particularly useful.

TP: 95, FP: 5

TN: 9900, FN: 0

Poor Performing Model

poor

A model that performs poorly, close to a random guess.

TP: 50, FP: 50

TN: 50, FN: 50

Perfect Prediction

perfect

A perfect model with no errors, resulting in the highest possible MCC score.

TP: 100, FP: 0

TN: 100, FN: 0

Other Titles
Understanding the Matthews Correlation Coefficient (MCC): A Comprehensive Guide
A deep dive into one of the most robust metrics for binary classification model evaluation.

1. What is the Matthews Correlation Coefficient (MCC)?

  • Defining MCC
  • The Confusion Matrix: The Foundation of MCC
  • Why MCC is a Superior Metric
The Matthews Correlation Coefficient (MCC), also known as the phi-coefficient, is a statistical rate which measures the quality of a binary classification. It is widely regarded as one of the most balanced and informative measures because it takes into account all four entries of the confusion matrix: True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN).
The MCC produces a value between -1 and +1. A coefficient of +1 represents a perfect prediction, 0 is no better than random, and -1 indicates total disagreement between prediction and observation. Unlike accuracy or the F1 score, MCC performs well even on imbalanced datasets, making it a more reliable metric in many real-world scenarios.
The Confusion Matrix
To understand MCC, you must first understand the confusion matrix. True Positives (TP): Correctly predicted positive cases. True Negatives (TN): Correctly predicted negative cases. False Positives (FP): Incorrectly predicted positive cases (a 'Type I' error). False Negatives (FN): Incorrectly predicted negative cases (a 'Type II' error).
Advantages of MCC
Imbalanced Data: It provides a fair score even if the number of negative and positive samples are very different. Symmetry: It's a symmetric metric, meaning it doesn't matter which class is labeled 'positive'. Swapping the positive and negative classes results in the same MCC value. Comprehensiveness: It is the geometric mean of the regression coefficients of the problem and its dual, summarizing performance in a single value.

2. Mathematical Derivation and Formula

  • The MCC Formula
  • Interpreting the Result
  • Handling Edge Cases
The MCC is calculated directly from the four values of the confusion matrix using a specific formula.
The Formula
MCC = (TP TN - FP FN) / sqrt((TP + FP) (TP + FN) (TN + FP) * (TN + FN))
The numerator, TP * TN - FP * FN, essentially measures the covariance between the predicted and actual values. A large positive value means the predictions align well with reality. The denominator is a normalization factor, scaling the result between -1 and +1. It is the geometric mean of the four sums of rows and columns of the confusion matrix.
Edge Case: Zero in Denominator
A crucial edge case occurs if any of the four sums in the denominator is zero (e.g., the model always predicts 'positive', making TN + FP = 0). In this situation, the denominator becomes zero, which would lead to a division-by-zero error. By convention, the MCC is defined as 0 in such cases, reflecting that the model has no predictive power.

3. Step-by-Step Guide to Using the Calculator

  • Gathering Your Data
  • Entering the Values
  • Interpreting the Output
Step 1: Obtain Confusion Matrix Values
Before using the calculator, you need to have the results of your binary classification model summarized in a confusion matrix. This means you need four numbers: True Positives (TP), False Positives (FP), True Negatives (TN), and False Negatives (FN).
Step 2: Input the Values
Enter each of these four values into the corresponding fields in the calculator. The fields are clearly labeled. Ensure you are entering non-negative integers.
Step 3: Calculate and Analyze
Click the 'Calculate' button. The tool will instantly provide the Matthews Correlation Coefficient (MCC). Alongside the MCC, it will also compute other useful metrics like Accuracy, Precision, Recall (Sensitivity), Specificity, and the F1 Score, giving you a complete picture of your model's performance.

Example Calculation

  • Let TP = 90, FP = 5, TN = 85, FN = 10.
  • Numerator = (90 * 85) - (5 * 10) = 7650 - 50 = 7600.
  • Denominator = sqrt((90+5)*(90+10)*(85+5)*(85+10)) = sqrt(95 * 100 * 90 * 95) = sqrt(81,225,000) ≈ 9012.49
  • MCC = 7600 / 9012.49 ≈ 0.843

4. Real-World Applications of MCC

  • Medical Diagnosis
  • Bioinformatics
  • Financial Fraud Detection
Medical Diagnosis
In medical testing, accurately identifying patients with a disease (true positives) and without a disease (true negatives) is critical. Because the number of healthy individuals often far outweighs the number of sick individuals (an imbalanced dataset), MCC is an excellent metric to evaluate the performance of diagnostic tests, as it isn't skewed by the large number of true negatives.
Bioinformatics
MCC is widely used in bioinformatics for tasks like predicting protein secondary structures. The prediction is a classification task (e.g., helix, sheet, or coil), and MCC provides a standard way to measure the quality of these predictions.
Financial Fraud Detection
In fraud detection, fraudulent transactions (the 'positive' class) are very rare compared to legitimate ones. Accuracy would be a misleading metric, as a model that always predicts 'not fraud' would have very high accuracy. MCC provides a much more realistic assessment of a fraud detection model's ability to distinguish between fraudulent and legitimate activities.

5. Common Misconceptions and Correct Methods

  • Accuracy vs. MCC
  • F1 Score vs. MCC
  • When to Use MCC
Misconception: 'High Accuracy Means a Good Model'
This is the most common fallacy, especially with imbalanced data. As mentioned in the fraud detection example, a model can achieve over 99% accuracy by simply guessing the majority class. MCC avoids this pitfall by incorporating the balance of all four confusion matrix cells, providing a score of near zero for such a naive model.
Misconception: 'F1 Score is Always Sufficient'
The F1 Score is the harmonic mean of precision and recall and is a great metric. However, it ignores the True Negatives. This can be problematic. For example, in a medical test for a rare disease, the number of true negatives (correctly identified healthy people) is a very important piece of information that F1 Score overlooks. MCC, by contrast, uses all four cells and thus provides a more complete summary.
When Should You Prioritize MCC?
You should consider MCC your primary metric whenever you are dealing with a binary classification task, but it becomes especially crucial when your dataset is imbalanced. Its ability to provide a single, interpretable, and balanced summary of performance makes it an indispensable tool for data scientists and researchers.