contingent
¶
Classes:
-
Contingent–dataclassto hold true and (batched) predicted values
Functions:
-
F1–Partially applied
f_betawith beta=1 (equal/no bias) -
avg_precision_score– -
f_beta–Fᵦ score
-
fowlkes_mallows–Fowlkes-Mallows (G), the geometric mean of precision and recall.
-
matthews_corrcoef–Matthew's Correlation Coefficient (MCC)
-
precision–TP/(TP+FP) i.e. Positive Predictive Value
-
recall–TP/(TP+FN) i.e. True Positive Rate
Contingent
dataclass
¶
dataclass to hold true and (batched) predicted values
Being a contingency library, this class is built around the idea of calculating which predictions are:
- True
- Predicted Negative (TN)
- Predicted Positive (TP)
- False
- Predicted Negative (FN)
- Predicted Positive (FP)
From these counts (TN, TP, FN, FP), all other contingency metrics are found.
Parameters:
-
(y_true¶Bool[ndarray, feat]) –True positive and negative binary classifications
-
(y_pred¶Bool[ndarray, '*#batch feat']) –Predicted, possible batched (tensor)
-
(weights¶Num[ndarray, '*#batch'] | None, default:None) –weight(s) for y_pred, useful for expected values of scores
Methods:
-
expected–A convenience function to calculate the expected value of a score.
-
f_beta–Fᵦ score (see
f_beta) -
from_scalar–Take scalar predictions and generate (batched) Contingent
Attributes:
-
F–F₁ score (see
f_beta) -
F2–F₂ score (see
f_beta) -
G–Fowlkes-Mallowes score, see
fowlkes_mallows -
mcc–Matthew's Correlation Coefficient (see
matthews_corrcoef) -
precision–see
precision -
recall–see
recall
Source code in src/contingency/contingent.py
69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 | |
expected
¶
A convenience function to calculate the expected value of a score.
Usually for use in tandem with Contingent.from_scalar(), since scores will be given over a range of weights (via self.weights)
Expected value is approximated with numerical integration via the trapezoidal rule. The exception is for Average Precision Score, which is calculated over the range of Recall scores and has been made to use a simple 1-st order difference so that scores match those derived by Scikit-learn.
Parameters:
-
(mode¶ScoreOptions, default:'aps') –available scores that can be aggregated over the y_pred probabilities
Source code in src/contingency/contingent.py
f_beta
¶
from_scalar
classmethod
¶
from_scalar(y_true: PredProb, x: PredProb | None, subsamples: int | None = None) -> T | None
Take scalar predictions and generate (batched) Contingent
By default, x is rescaled to [0,1] and used as the weights parameter for the Contingent constructor. Only unique values are needed, since the thresholding only changes with each unique prediction value.
Uses numpy's less_equal.outer to accomplish fast, vectorized thresholding
and enable rapid estimation of batched scores across all thresholds.
Parameters:
-
(y_true¶PredProb) –True pos/neg binary vector
-
(x¶PredProb | None) –Scalar weights for relative prediction strength (positive)
-
(subsamples¶int | None, default:None) –Number of evenly spaced threshold values to use when subsampling the original data
Source code in src/contingency/contingent.py
F1
¶
F1(Y: Contingent) -> ProbThres
avg_precision_score
¶
avg_precision_score(Y: Contingent) -> float
f_beta
¶
f_beta(beta: float, Y: Contingent) -> ProbThres
Fᵦ score
Weighted harmonic mean of precision and recall, with β-times more bias for recall.
Source code in src/contingency/contingent.py
fowlkes_mallows
¶
fowlkes_mallows(Y: Contingent) -> ProbThres
Fowlkes-Mallows (G), the geometric mean of precision and recall.
Commonly used in unsupervised cases where synthetic test-data has been made available (e.g. MENDR, clustering validation, etc.)
Recently shown to be the limit of MCC as the number of True Negatives goes to infinity, making it useful for imbalanced, needle-in-haystack problems, like multi-cluster assignment.
Source code in src/contingency/contingent.py
matthews_corrcoef
¶
matthews_corrcoef(Y: Contingent) -> ProbThres
Matthew's Correlation Coefficient (MCC)
Also called the φ coefficient, it is similar to a Pearson correlation for binary variables.
Widely considered the most fair/least bias metric for imbalanced classification tasks. (Chico & Jurman, 2023)
Source code in src/contingency/contingent.py
precision
¶
precision(Y: Contingent) -> ProbThres
recall
¶
recall(Y: Contingent) -> ProbThres