contingent ¶

Classes:

Contingent –

dataclass to hold true and (batched) predicted values

Functions:

F1 –

Partially applied f_beta with beta=1 (equal/no bias)
avg_precision_score –
f_beta –

Fᵦ score
fowlkes_mallows –

Fowlkes-Mallows (G), the geometric mean of precision and recall.
matthews_corrcoef –

Matthew's Correlation Coefficient (MCC)
precision –

TP/(TP+FP) i.e. Positive Predictive Value
recall –

TP/(TP+FN) i.e. True Positive Rate

Contingent `dataclass` ¶

dataclass to hold true and (batched) predicted values

Being a contingency library, this class is built around the idea of calculating which predictions are:

True
- Predicted Negative (TN)
- Predicted Positive (TP)
False
- Predicted Negative (FN)
- Predicted Positive (FP)

From these counts (TN, TP, FN, FP), all other contingency metrics are found.

Parameters:

y_true ¶
(Bool[ndarray, feat]) –

True positive and negative binary classifications
y_pred ¶
(Bool[ndarray, '*#batch feat']) –

Predicted, possible batched (tensor)
weights ¶
(Num[ndarray, '*#batch'] | None, default: None ) –

weight(s) for y_pred, useful for expected values of scores

Methods:

expected –

A convenience function to calculate the expected value of a score.
f_beta –

Fᵦ score (see f_beta)
from_scalar –

Take scalar predictions and generate (batched) Contingent

Attributes:

F –

F₁ score (see f_beta)
F2 –

F₂ score (see f_beta)
G –

Fowlkes-Mallowes score, see fowlkes_mallows
mcc –

Matthew's Correlation Coefficient (see matthews_corrcoef)
precision –

see precision
recall –

see recall

Source code in src/contingency/contingent.py

@jaxtyped(typechecker=typechecker)
@dataclass
class Contingent:
    """`dataclass` to hold true and (batched) predicted values

    Being a contingency library, this class is built around the idea
    of calculating which predictions are:

    - True
        - Predicted Negative (TN)
        - Predicted Positive (TP)
    - False
        - Predicted Negative (FN)
        - Predicted Positive (FP)

    From these counts (TN, TP, FN, FP), all other contingency metrics
    are found.  

    Parameters:
        y_true: True positive and negative binary classifications
        y_pred: Predicted, possible batched (tensor)
        weights: weight(s) for y_pred, useful for expected values of scores

    """
    y_true: Bool[nda, 'feat']
    y_pred: Bool[nda, '*#batch feat']

    weights: Num[nda, '*#batch']|None = None

    TP: Num[nda, "..."] = field(init=False)
    FP: Num[nda, "..."] = field(init=False)
    FN: Num[nda, "..."] = field(init=False)
    TN: Num[nda, "..."] = field(init=False)


    PP: Num[nda, "..."] = field(init=False)
    PN: Num[nda, "..."] = field(init=False)
    P: Num[nda, "..."] = field(init=False)
    N: Num[nda, "..."] = field(init=False)


    PPV: Num[nda, "..."] = field(init=False)
    NPV: Num[nda, "..."] = field(init=False)
    TPR: Num[nda, "..."] = field(init=False)
    TNR: Num[nda, "..."] = field(init=False)

    def __post_init__(self):
        self.y_true = np.atleast_2d(self.y_true)
        self.y_pred = np.atleast_2d(self.y_pred)
        self.TP = _TP(self.y_true, self.y_pred)
        self.FP = _FP(self.y_true, self.y_pred)
        self.FN = _FN(self.y_true, self.y_pred)
        self.TN = _TN(self.y_true, self.y_pred)

        self.PP = self.TP + self.FP
        self.PN = self.FN + self.TN
        self.P = self.TP + self.FN
        self.N = self.FP + self.TN

        # self.PPV = np.divide(self.TP, self.PP, out=np.ones_like(self.TP), where=self.PP!=0.)
        self.PPV = np.ma.divide(self.TP, self.PP)
        self.NPV = np.ma.divide(self.TN, self.PN)
        self.TPR = np.ma.divide(self.TP, self.P)
        self.TNR = np.ma.divide(self.TN, self.N)


    @classmethod
    def from_scalar[T](
        cls: Type[T],
        y_true: PredProb,
        x:PredProb|None,
        subsamples:int|None=None
    )->T|None:
        """Take scalar predictions and generate (batched) Contingent

        By default, x is rescaled to [0,1] and used as the weights parameter
        for the Contingent constructor. Only unique values are needed, since
        the thresholding only changes with each unique prediction value.

        Uses numpy's `less_equal.outer` to accomplish fast, vectorized thresholding
        and enable rapid estimation of batched scores across all thresholds.


        Parameters:
            y_true: True pos/neg binary vector
            x: Scalar weights for relative prediction strength (positive)
            subsamples: Number of evenly spaced threshold values to use when subsampling the original data
        """
        # p, x_p = _quantile_tf(x)
        if x is None:
            warnings.warn("`None` value received, passing it on...", UserWarning)
            return None
        p, x_p = _minmax_tf(x)
        if subsamples:
            p = np.interp(
                np.linspace(0,1,subsamples),
                np.linspace(0,1,p.shape[0]),
                p
            )
        y_preds = np.less_equal.outer(p,x_p)

        return cls(y_true, y_preds, weights=p)



    def f_beta(self, beta=1):
        """Fᵦ score (see [`f_beta`][contingency.contingent.f_beta])"""
        return f_beta(beta, self)

    @property
    def F2(self):
        """F₂ score (see [`f_beta`][contingency.contingent.f_beta]) 

        """
        return f_beta(2., self)

    @property
    def F(self) :
        """F₁ score (see [`f_beta`][contingency.contingent.f_beta])"""
        return F1(self)

    @property
    def recall(self):
        """see [`recall`][contingency.contingent.recall]"""
        return recall(self)

    @property
    def precision(self):
        """see [`precision`][contingency.contingent.precision]"""
        return precision(self)

    @property
    def mcc(self):
        """Matthew's Correlation Coefficient (see [`matthews_corrcoef`][contingency.contingent.matthews_corrcoef]) 
        """
        return matthews_corrcoef(self)

    @property
    def G(self):
        """Fowlkes-Mallowes score, see [`fowlkes_mallows`][contingency.contingent.fowlkes_mallows]"""
        return fowlkes_mallows(self)

    @typechecker
    def expected(self, mode: ScoreOptions='aps')->float:
        """
        A convenience function to calculate the expected value of a score.

        Usually for use in tandem with `Contingent.from_scalar()`, since scores will be given over a range of weights (via self.weights)

        Expected value is approximated with numerical integration via the trapezoidal rule.
        The exception is for Average Precision Score, which is calculated over the range of Recall scores and has been made to use a simple 1-st order difference so that scores match those derived by Scikit-learn.

        Parameters:
            mode: available scores that can be aggregated over the y_pred probabilities
        """
        if mode=='aps':
            return avg_precision_score(self)
        else:
            return trapezoid(getattr(self, mode), x=self.weights)

F `property` ¶

F₁ score (see f_beta)

F2 `property` ¶

F2

F₂ score (see f_beta)

G `property` ¶

Fowlkes-Mallowes score, see fowlkes_mallows

mcc `property` ¶

mcc

Matthew's Correlation Coefficient (see matthews_corrcoef)

precision `property` ¶

precision

see precision

recall `property` ¶

recall

see recall

expected ¶

expected(mode: ScoreOptions = 'aps') -> float

A convenience function to calculate the expected value of a score.

Usually for use in tandem with Contingent.from_scalar(), since scores will be given over a range of weights (via self.weights)

Expected value is approximated with numerical integration via the trapezoidal rule. The exception is for Average Precision Score, which is calculated over the range of Recall scores and has been made to use a simple 1-st order difference so that scores match those derived by Scikit-learn.

Parameters:

mode ¶
(ScoreOptions, default: 'aps' ) –

available scores that can be aggregated over the y_pred probabilities

Source code in src/contingency/contingent.py

@typechecker
def expected(self, mode: ScoreOptions='aps')->float:
    """
    A convenience function to calculate the expected value of a score.

    Usually for use in tandem with `Contingent.from_scalar()`, since scores will be given over a range of weights (via self.weights)

    Expected value is approximated with numerical integration via the trapezoidal rule.
    The exception is for Average Precision Score, which is calculated over the range of Recall scores and has been made to use a simple 1-st order difference so that scores match those derived by Scikit-learn.

    Parameters:
        mode: available scores that can be aggregated over the y_pred probabilities
    """
    if mode=='aps':
        return avg_precision_score(self)
    else:
        return trapezoid(getattr(self, mode), x=self.weights)

f_beta ¶

f_beta(beta=1)

Fᵦ score (see f_beta)

Source code in src/contingency/contingent.py

def f_beta(self, beta=1):
    """Fᵦ score (see [`f_beta`][contingency.contingent.f_beta])"""
    return f_beta(beta, self)

from_scalar `classmethod` ¶

from_scalar(y_true: PredProb, x: PredProb | None, subsamples: int | None = None) -> T | None

Take scalar predictions and generate (batched) Contingent

By default, x is rescaled to [0,1] and used as the weights parameter for the Contingent constructor. Only unique values are needed, since the thresholding only changes with each unique prediction value.

Uses numpy's less_equal.outer to accomplish fast, vectorized thresholding and enable rapid estimation of batched scores across all thresholds.

Parameters:

y_true ¶
(PredProb) –

True pos/neg binary vector
x ¶
(PredProb | None) –

Scalar weights for relative prediction strength (positive)
subsamples ¶
(int | None, default: None ) –

Number of evenly spaced threshold values to use when subsampling the original data

Source code in src/contingency/contingent.py

@classmethod
def from_scalar[T](
    cls: Type[T],
    y_true: PredProb,
    x:PredProb|None,
    subsamples:int|None=None
)->T|None:
    """Take scalar predictions and generate (batched) Contingent

    By default, x is rescaled to [0,1] and used as the weights parameter
    for the Contingent constructor. Only unique values are needed, since
    the thresholding only changes with each unique prediction value.

    Uses numpy's `less_equal.outer` to accomplish fast, vectorized thresholding
    and enable rapid estimation of batched scores across all thresholds.


    Parameters:
        y_true: True pos/neg binary vector
        x: Scalar weights for relative prediction strength (positive)
        subsamples: Number of evenly spaced threshold values to use when subsampling the original data
    """
    # p, x_p = _quantile_tf(x)
    if x is None:
        warnings.warn("`None` value received, passing it on...", UserWarning)
        return None
    p, x_p = _minmax_tf(x)
    if subsamples:
        p = np.interp(
            np.linspace(0,1,subsamples),
            np.linspace(0,1,p.shape[0]),
            p
        )
    y_preds = np.less_equal.outer(p,x_p)

    return cls(y_true, y_preds, weights=p)

F1 ¶

F1(Y: Contingent) -> ProbThres

Partially applied f_beta with beta=1 (equal/no bias)

Source code in src/contingency/contingent.py

def F1(Y:Contingent)->ProbThres:
    """Partially applied [`f_beta`][contingency.contingent.f_beta] with beta=1 (equal/no bias)
    """
    return  f_beta(1., Y)

avg_precision_score ¶

avg_precision_score(Y: Contingent) -> float

Source code in src/contingency/contingent.py

def avg_precision_score(Y:Contingent)->float:
    """ """
    return np.sum(np.diff(Y.recall[::-1], prepend=0) * Y.precision[::-1])

f_beta ¶

f_beta(beta: float, Y: Contingent) -> ProbThres

Fᵦ score

Weighted harmonic mean of precision and recall, with β-times more bias for recall.

Source code in src/contingency/contingent.py

def f_beta(beta:float, Y:Contingent)-> ProbThres:
    """Fᵦ score

    Weighted harmonic mean of precision and recall, with β-times
    more bias for recall.
    """
    top = (1+beta**2)*Y.PPV*Y.TPR
    bottom = beta**2*Y.PPV + Y.TPR

    return np.ma.divide(top, bottom).filled(0.)

fowlkes_mallows ¶

fowlkes_mallows(Y: Contingent) -> ProbThres

Fowlkes-Mallows (G), the geometric mean of precision and recall.

Commonly used in unsupervised cases where synthetic test-data has been made available (e.g. MENDR, clustering validation, etc.)

Recently shown to be the limit of MCC as the number of True Negatives goes to infinity, making it useful for imbalanced, needle-in-haystack problems, like multi-cluster assignment.

Source code in src/contingency/contingent.py

def fowlkes_mallows(Y:Contingent)->ProbThres:
    """Fowlkes-Mallows (G), the geometric mean of precision and recall.

    Commonly used in unsupervised cases where synthetic test-data
    has been made available (e.g. MENDR, clustering validation, etc.)

    [Recently shown](https://arxiv.org/pdf/2305.00594) to be the limit
    of [MCC][contingency.contingent.matthews_corrcoef] as the number of
    True Negatives goes to infinity, making it useful for imbalanced,
    needle-in-haystack problems, like multi-cluster assignment.
    """
    return np.sqrt(recall(Y)*precision(Y))

matthews_corrcoef ¶

matthews_corrcoef(Y: Contingent) -> ProbThres

Matthew's Correlation Coefficient (MCC)

Also called the φ coefficient, it is similar to a Pearson correlation for binary variables.

Widely considered the most fair/least bias metric for imbalanced classification tasks. (Chico & Jurman, 2023)

Source code in src/contingency/contingent.py

def matthews_corrcoef(Y:Contingent)->ProbThres:
    """ Matthew's Correlation Coefficient (MCC)

    Also called the φ coefficient, it is similar to a Pearson correlation
    for binary variables.

    Widely considered the most fair/least bias metric for imbalanced
    classification tasks. [(Chico & Jurman, 2023)](https://doi.org/10.1186/s13040-023-00322-4)
    """
    m = np.vstack([Y.TPR,Y.TNR,Y.PPV,Y.NPV])
    l = np.sqrt(m).prod(axis=0)
    r = np.sqrt(1-m).prod(axis=0)
    return (l - r).filled(0)

precision ¶

precision(Y: Contingent) -> ProbThres

TP/(TP+FP) i.e. Positive Predictive Value

Source code in src/contingency/contingent.py

def precision(Y:Contingent)->ProbThres:
    """TP/(TP+FP) i.e. Positive Predictive Value"""
    return Y.PPV.filled(1.)

recall ¶

recall(Y: Contingent) -> ProbThres

TP/(TP+FN) i.e. True Positive Rate

Source code in src/contingency/contingent.py

def recall(Y:Contingent)->ProbThres:
    """TP/(TP+FN) i.e. True Positive Rate"""
    return Y.TPR.filled(1.)

contingent ¶

Contingent `dataclass` ¶

`y_true` ¶

`y_pred` ¶

`weights` ¶

F `property` ¶

F2 `property` ¶

G `property` ¶

mcc `property` ¶

precision `property` ¶

recall `property` ¶

expected ¶

`mode` ¶

f_beta ¶

from_scalar `classmethod` ¶

`y_true` ¶

`x` ¶

`subsamples` ¶

F1 ¶

avg_precision_score ¶

f_beta ¶

fowlkes_mallows ¶

matthews_corrcoef ¶

precision ¶

recall ¶

contingent ¶

Contingent dataclass ¶

y_true ¶

y_pred ¶

weights ¶

F property ¶

F2 property ¶

G property ¶

mcc property ¶

precision property ¶

recall property ¶

expected ¶

mode ¶

f_beta ¶

from_scalar classmethod ¶

y_true ¶

x ¶

subsamples ¶

F1 ¶

avg_precision_score ¶

f_beta ¶

fowlkes_mallows ¶

matthews_corrcoef ¶

precision ¶

recall ¶

Contingent `dataclass` ¶

`y_true` ¶

`y_pred` ¶

`weights` ¶

F `property` ¶

F2 `property` ¶

G `property` ¶

mcc `property` ¶

precision `property` ¶

recall `property` ¶

`mode` ¶

from_scalar `classmethod` ¶

`y_true` ¶

`x` ¶

`subsamples` ¶