Confusion Matrix#

Guide: True Positive / False Positive / False Negative / True Negative

You can find more info on true positive, false positive, false negative, and true negative in the TP / FP / FN / TN guide.

A confusion matrix is a structured plot describing classification model performance as a table that highlights counts of objects with predicted classes (columns) against the actual classes (rows). Each cell has a count of the number of objects that have its correct class and predicted class, which indicates how confused a model is. A model is confused when a predicted class does not match the actual class. When they do match, this is considered a true positive (TP). In general, a model resulting in more true positives (TPs) / true negatives (TNs) with fewer false positives (FPs) / false negatives (FNs) is better.

Confusion Matrix Image

Confusion matrices are used in classification workflows with only one class or with multiple classes, which extends to object detection workflows, too. They help evaluate models by counting classification errors and visualizing class imbalances.

Example

To see an example of confusion matrix, checkout the CIFAR-10 on app.kolena.com/try.

Implementation Details#

The implementation of a confusion matrix depends on whether the workflow concerns one or more classes.

Info

The Confusion Matrix feature in Kolena supports visualizing comparisons between a single predicted value and a single ground truth value. If you're having trouble generating or customizing plots for these visualizations, feel free to reach out to the Kolena team for support.

Steps To Generate Plot

Steps to Generate a Custom Confusion Matrix Plot:

Go to the Debugger page.
Add model results with predicted values.
Scroll down to Custom Plots.
Select the predicted results for the X-Axis and the ground truth for the Y-Axis.

Single-Class#

Single-class confusion matrices are used for binary classification problems. After computing the number of TPs, FPs, FNs, and TNs, a confusion matrix would look like this:

| | Predicted Positive | Predicted Negative | | --- | --- | --- | | **Actual Positive** | True Positive (TP) | False Negative (FN) | | **Actual Negative** | False Positive (FP) | True Negative (TN) |

Example: Single-Class#

Let's consider a simple binary classification example and plot a confusion matrix. The table below shows five samples' (three positive and two negative) ground truth labels and inference labels.

| | Sample 1 | Sample 2 | Sample 3 | Sample 4 | Sample 5 | | --- | --- | --- | --- | --- | --- | | **Ground Truth** | `Cat` | `Cat` | `Cat` | `No Cat` | `No Cat` | | **Inference** | `Cat` | `No Cat` | `No Cat` | `No Cat` | `Cat` |

A confusion matrix for this example can be plotted:

| | Predicted `Cat` | Predicted `No Cat` | | --- | --- | --- | | **`Cat`** | 1 | 2 | | **`No Cat`** | 1 | 1 |

Multiclass#

Multiclass confusion matrices, used for multiclass classification problems, outline counts of TPs, FPs, FNs, and TNs for every unique pair of actual and predicted labels. A multiclass classification confusion matrix with three classes would have the following format:

	Predicted `Airplane`	Predicted `Boat`	Predicted `Car`
Actual `Airplane`	Correct Prediction	Incorrect Prediction	Incorrect Prediction
Actual `Boat`	Incorrect Prediction	Correct Prediction	Incorrect Prediction
Actual `Car`	Incorrect Prediction	Incorrect Prediction	Correct Prediction

And for example, if we are trying to calculate the counts of TP, FP, FN, and TN for class Boat:

	Predicted `Airplane`	Predicted `Boat`	Predicted `Car`
Actual `Airplane`	True Negative	False Positive	True Negative
Actual `Boat`	False Negative	True Positive	False Negative
Actual `Car`	True Negative	False Positive	True Negative

Example: Multiclass#

Let's take a look at a multiclass classification example and plot a confusion matrix. In this example, we have three classes: Airplane, Boat, and Car. The multiclass classifier outputs the following inferences:

| | Sample 1 | Sample 2 | Sample 3 | Sample 4 | Sample 5 | Sample 6 | | --- | --- | --- | --- | --- | --- | --- | | **Ground Truth** | `Airplane` | `Boat` | `Car` | `Airplane` | `Boat` | `Boat` | | **Inference** | `Airplane` | `Boat` | `Airplane` | `Airplane` | `Boat` | `Car` |

A confusion matrix for this example can be plotted:

| | Predicted `Airplane` | Predicted `Boat` | Predicted `Car` | | --- | --- | --- | --- | | `Airplane` | 2 | 0 | 0 | | `Boat` | 0 | 2 | 1 | | `Car` | 1 | 0 | 0 |

In a different case, these counts may be much higher:

| | Predicted `Airplane` | Predicted `Boat` | Predicted `Car` | | --- | --- | --- | --- | | `Airplane` | 200 | 0 | 0 | | `Boat` | 100 | 8,800 | 600 | | `Car` | 100 | 0 | 1,000 |

This confusion matrix reveals that a model is very good at identifying the Boat class: 8,800 of 9,500 Boats were correctly predicted. Of the 700 incorrect Boat predictions, 600 were predicted as Car, and 100 were predicted as Airplane.

This confusion matrix indicates that when a model makes a Airplane inference, the model is correct half the time. If it is incorrect, it has labeled the Airplane as a Boat or a Car.

Whenever there is an actual Airplane class, the model never predicts that there is a different transportation object.

Normalization#

Sometimes it is easier to focus on class-level behavior if you are using a normalized confusion matrix. If confusion matrices are color-coded, normalizing can also create a better visual representation:

Example of normalized and colored confusion matrix

You can normalize a confusion matrix by row (actual classes), column (predicted classes), or all (entire matrix). Each type of normalization surfaces a view sharing different information, which is outlined below.

Normalizing by row

For an actual class, this normalization allows us to see the proportion of correctly or incorrectly predicted objects for each predicted class. Notice that the diagonal values from this normalization match the recall per class. To normalize by row, divide each entry in that row by the sum of values within it. If we normalize the multiclass example by row, we get:

	Predicted `Airplane`	Predicted `Boat`	Predicted `Car`
`Airplane`	1	0	0
`Boat`	0.01	0.93	0.06
`Car`	0.09	0	0.91

Normalizing by column

For a predicted class, this normalization allows us to see the proportion of instances predicted as a certain class that actually belong to each true class. Notice that the diagonal values from this normalization match the precision per class. To normalize by column, divide each entry in a column by the sum of values within that column. If we normalize the multiclass example by column, we get:

	Predicted `Airplane`	Predicted `Boat`	Predicted `Car`
`Airplane`	0.5	0	0
`Boat`	0.25	1	0.375
`Car`	0.25	0	0.625

Normalizing by all

For each entry, this normalization allows us to see the overall proportion of instances that fall into a combination of an actual and predicted class. To normalize by all, divide each entry by the total sum of all the values in the matrix. If we normalize the multiclass example by all, we get:

	Predicted `Airplane`	Predicted `Boat`	Predicted `Car`
`Airplane`	0.02	0	0
`Boat`	0.01	0.81	0.06
`Car`	0.01	0	0.09

Limitations and Biases#

Confusion matrices are great for the evaluation of models that deal with multiple classes. They are structured tables of numbers, which is its strength and weakness.

Class imbalance: Confusion matrices can appear biased when dealing with imbalanced numbers of instances per class, leading to skewed numbers. This can be addressed by normalizing the matrix.
Categorical evaluation: Confusion matrices have categorical outputs and do not surface any details for misclassifications. All misclassifications are treated equally, so there may be cases where classes are similar or hierarchically related, but confusion matrices will not account for these details.