Confusion Matrix

A Simple Explanation - By Varsha Saini

Working on a machine learning project is not only about building a model, model evaluation is an important step as it helps you to measure the effectiveness of the model build. A confusion matrix can be used to understand the performance of a model.

What is the Confusion Matrix?

A confusion matrix is a visual representation of actual and predicted values in the form of a tabular structure which can be used to measure the performance of a machine learning model.

A confusion matrix is used for evaluating the classification model. It can be used for both binary and multiclass classification.

Confusion Matrix for Binary Classification

The below image represents a confusion matrix for binary classification which has two classes, class 0 and class 1. The left values represent the actual class and the top values represent the predicted class by the machine learning model used.

It has four important terms, true positive, true negative, false positive, and false negative. Let’s understand them in detail.

1. True Positive (TP)

The value is true positive if both the actual and predicted class are positive (belongs to class 1).

2. False Positive (FP)

The value is false positive if the actual class is negative but it is predicted as positive. Since the value is incorrectly predicted, it is considered an error.

3. True Negative (TN)

The value is true negative if both the actual and predicted class are negative (belongs to class 0).

4. False Negative (FN)

The value is false negative if the actual class is positive but it is predicted as negative. Since the value is incorrectly predicted, it is considered an error.

Why Do We Need a Confusion Matrix

Confusion matrix can be used to calculate important model evaluation matrices like Accuracy, Recall, Precision, F1 Score, etc.

1. Accuracy

Accuracy is defined as the ratio of correctly predicted values to the total values. It is used in the case of the balanced dataset.

2. Recall or Sensitivity

The recall is defined as the ratio of the no of values that are predicted as a positive class out of the total no of values that actually belongs to the positive class. It is used when a false negative is more important.

3. Precision or Specificity

The precision is defined as the ratio of the no of values that actually belongs to the positive class out of the total no of values that are predicted as the positive class. It is used when a false positive is more important.

4. F1 Score

The F1 Score is the harmonic mean of precision and recall. It is used when both false positives and false negatives are equally important.