Skip to content

Tutorial

Binary classification metrics are all about testing the quality of your predicted labels against the actual labels you observed.

To start out, we will need example predictions (y_pred) and targets (y_true).

python imports & setup
import numpy as np
from rich import print
import matplotlib.pyplot as plt

np.set_printoptions(formatter={'float_kind':"{:.5g}".format})
y_true = np.array([0,1,0,0,1]).astype(bool)
y_pred = np.array([0,1,0,1,0]).astype(bool)

Basic Instantiation

Now, just instantiate the Contingent dataclass with your true and predicted target values.

from contingency import Contingent
M = Contingent(y_pred=y_pred, y_true=y_true)

print(M)
output

Contingent(
    y_true=array([[False,  True, False, False,  True]]),
    y_pred=array([[False,  True, False,  True, False]]),
    weights=None,
    TP=array([1]),
    FP=array([1]),
    FN=array([1]),
    TN=array([2]),
    PP=array([2]),
    PN=array([3]),
    P=array([2]),
    N=array([3]),
    PPV=masked_array(data=[0.5],
             mask=[False],
       fill_value=1e+20),
    NPV=masked_array(data=[0.6666666666666666],
             mask=[False],
       fill_value=1e+20),
    TPR=masked_array(data=[0.5],
             mask=[False],
       fill_value=1e+20),
    TNR=masked_array(data=[0.6666666666666666],
             mask=[False],
       fill_value=1e+20)
)

We now have access to properties that will return useful metrics from these contingency counts, such as:

print(M.mcc, M.F, M.G, sep='\n')
[0.16667]
[0.5]
[0.5]

Contingencies from Probabilities

Most ML systems do not output binary classifications directly, but instead output probabilities or weights.

y_prob = np.array([0.1,0.8,0.1,.7,.25])

Thresholding these will create an entire "family" of predictions as the threshold increases or lowers.

Contingent easily handles this as a simple broadcasting operation, using numpy. To access this functionality, produce a Contingent instance using the from_scalar() constructor with you scalar predictions:

M_batch = Contingent.from_scalar(y_true, y_prob)
print(M_batch.weights, M_batch.y_pred, sep='\n')

M_batch.y_pred.shape
[0 1e-05 0.21429 0.85714 0.99999 1]
[[ True  True  True  True  True]
 [ True  True  True  True  True]
 [False  True False  True  True]
 [False  True False  True False]
 [False  True False False False]
 [False False False False False]]
(6, 5)

Note how the number of positives decreases as the threshold increases (downward, increasing with each row).

Likewise, we can see that the set of metrics is now vectorized as well, since each threshold implies a different set of TP,FP, FN, and TN counts:

print(M_batch.mcc, M_batch.F, M_batch.G, sep='\n')
[0 0 0.66667 0.16667 0.61237 0]
[0.57143 0.57143 0.8 0.5 0.66667 0]
[0.63246 0.63246 0.8165 0.5 0.70711 0]

Expected Values

We would like a single score that summarizes our classifier's performance across all thresholds.

This can traditionally be done using Average Precision Score (APS), which is the average precision weighted across the recall scores. Alternatively, Contingent.expected can calculate the expected value of each provided score type across the set of unique threshold values.

for score in ('aps', 'mcc', 'F'):
    print(M_batch.expected(score))
0.8333333333333333
0.39492652768935094
0.6481253367346939

Optional Plotting Utilities

For those of us that are consistently performing threshold sensitivity analyses, a Precision-Recall (P-R) curve probably feels like an old friend. Communicating these with respect to the aggregate scores like F and G can be tricky, so we've provided a simple template matplotlib.axes.Axes object. You can access this template by importing the included plot utility PR_contour for making nicely formatted P-R curve axes to plot your Contingent metrics on.

from contingency.plots import PR_contour

M_batch.expected('aps')
plt.figure(figsize=(5,5))
PR_contour()
plt.step(M_batch.recall, M_batch.precision, color='k', ls='--', where='post')

png

Tip

While the Contingent class does not have a method to automatically plot its own P-R curves on a contour like this, such functionality is planned to be added at a later time.