A calibration curve (sometimes called a “reliability diagram”) tells you whether your model’s predicted probabilities accurately reflect the real chance of your model being right.

It’s very common for neural networks to overestimate the confidence in their predictions, and this type of diagram helps us detect when this phenomenon occurs. Here’s an example:

On the x-axis we have our model’s predicted confidence. On the y-axis we plot the model accuracy given its predicted confidence. We can see from this particular diagram that the model is “overconfident” when it makes a prediction in the range between 0.5 to 0.7.

Calibration curves for multiclass classifiers

Scikit learn provides a function to compute calibration curves for binary classification problems. However, in many cases we want to obtain the calibration curve for a model that makes predictions for more than 2 classes.

We can look to Guo et al. to see how they generate their calibration curve plots.

They propose “binning” all predicted confidences into equally wide bins. Where is the bin containing the set of indices of samples that fall into interval .

For each bin we can compute the bin accuracy (which is the y-axis on our graph) using the following formula:

Here, is if the example label belongs to the same class as the prediction , and otherwise.

To sum up, we compute the y-axis of our plot by first segmenting all predicted confidence scores into M bins. Each of these prediction scores are associated to a class . For each bin, we count the number of examples whose labels match the class associated to our predicted score and divide by the total count of items in the bin.

Code

Here is the code to compute and plot the calibration curves for your models in matplotlib.

Hey there! I'm Huy and I do research in computer vision, visual search, and AI. You can get updates on new essays by subscribing to my rss feed. Occassionally, I will send out interesting links on twitter so follow me if you like this kind stuff.

## Discussion