Developer Name: Oz Forensics LLC | Algorithm Name: oz_002 | Algorithm Type: 1:1 Verification

Date of Algorithm Submission: 2021_02_05 | Date of Report Card Generation: 2024-03-27

The report shows accuracy improvements over time for this developer. The traces correspond to the datasets named in the legend. The FMR is fixed independently for each dataset to the value given in the y-axis label.

For each dataset, the panels show false non-match rates vs. false match rates for oz_002 and several of the most accurate algorithms listed in the caption. The most accurate algorithms vary by dataset. When negative values appear on the vertical axis, they are logarithms of FNMR. Use mouseover to see FNMR, FMR and threshold values.

To inform threshold setting, the two panels show, respectively, FMR and FNMR as a function of threshold. Many applications will use the threshold to target a specific FMR, established by policy. For a given threshold, FNMR variation is expected across datasets due, primarily, to quality and agageing differences. FMR differences may be due to different demographic composition - see figure below - or other factors.

For mugshot images, FNMR as a function of elapsed time between initial enrollment and second verification images. The panels are for some more and less accurate algorithms, and the target of this report. The four traces correspond to images annotated with codes for black female, black male, white female, white male. The threshold is fixed for each algorithm to give FMR = 0.00001 over all approximately 10^8 impostor comparisons. For short time-lapses, the most accurate algorithms give very few errors (FNMR < 0.001) so that the bootstrap uncertainty estimates given by the blue ribbon are high. The dashed line gives the mean of the four demographic values.

The figure shows similarity scores for 12 genuine and 8 impostor image pairs used in the May 2018 paper https://doi.org/10.1073/pnas.1721355115 Face recognition accuracy of forensic examiners, superrecognizers, and face recognition algorithms (Phillips et al.). The threshold (red horizontal line) is a value calibrated to give FMR = 0.0001 on mugshot images. Points above the threshold correspond to pairs determined to be genuine, and points below the threshold correspond to pairs determined to be impostors. If the determined class (genuine or impostor) matches the real class, points will be blue; if not, red. An X represents face detection failure in either of the images in the pair. Note that the sample size (n=20) is small, and the figure may change substantially if larger or different sets are used. The images can be viewed at https://www.pnas.org/doi/suppl/10.1073/pnas.1721355115/suppl_file/pnas.1721355115.sapp.pdf, where Gen 01 corresponds to Same-Identity Pair 1, Gen 02 corresponds to Same-Identity Pair 2, and so on.

For women, left, and men, the panels show false non-match rates when mediocre border cross photos are compared against high quality reference application portraits collected from individuals born in the country identified on the horizontal axis and aged either above or below 45 years of age at the time of the application photo. The square dots give the empirical FNMR point estimate. The vertical lines give bootstrap 95-percent confidence intervals around the point estimate. The intervals are wider when the country and age group is less-represented in this dataset. Overlapping intervals is an indication of no significant difference. Low FNMR values are synonymous with high accuracy.

For non-mate comparisons of mugshots of black and white (B-W) males and females (M-F), the panels show false match rates for five algorithms: two for which on-diagonal demographic differentials are low, two for which they're high, and the target algorithm in this report. In the top row of panels the threshold is set for each algorithm to give FMR = 0.001 for white males which is the demographic that usually gives the lowest FMR. In the second row the white-male FMR = 0.0001. This means the top right box is the same color in all panels of a row.

The following seven figures are aggregations or extracts of the full matrix of all cross-demographic FMR estimates. The 1:1 results here will be relevant to search applications if 1:N is implemented by just using the algorithm to execute N 1:1 comparisons, as explained here .

For comparison of high quality application portraits, the panels show false match rates for the target algorithm operating at one fixed threshold (value in the legend). The upper row of planels corresponds to comparison of different sex individuals, the second row for men compared with men, the final row for women. Each column corresponds to individuals born in the identified country, with higher FMR placed to the right. The country of birth is used as a proxy for race. Within each panel, the 5x5 array corresponds to full cross comparison of people across 5 age groups. The color and text in a cell indicate log10 FMR, a low (blue) value indicating low chance of false match. Red values tend to occur when subjects are of the same age, and same sex. It is common for women to give higher FMR than men. The oldeest and youngest also tend to give elevated FMR.

For comparison of high quality application portraits, the panels show how FMR in women exceeds that in men for the target algorithm operating at one fixed threshold (value in legend). The color encodes log10 FMR(F)/FMR(M), and the text is a simple multiplier. The horizontal axis gives country of birth and these appear in order of maximum multiplier. The country of birth is used as a proxy for race.

For comparison of high quality application portraits, the panels show how FMR increases with increasing levels of demographic pairing shown on the y-axis. The the target algorithm is operating at one fixed threshold (value in legend). The color and text encode log10 FMR. FMR values below 0.000001 are shown as that value. The horizontal axis gives country of birth and these appear in order of maximum multiplier.

For comparison of high quality application portraits using the target algorithm configured with a fixed threshold (given in the legend), the heatmap shows FMR when comparing photos of men, aged 30-50, who are born in the countries identified on the two axes. The country of birth is used as a proxy for race. The color and the text encode log10 FMR, such that a +1 difference indicates 10 times the FMR. The countries are grouped by geographic region. The block (diagonal) structure shows FMR tends to be elevated within region. FMR is often lowest in E. Europe.

For comparison of high quality application portraits using the target algorithm configured with a fixed threshold (given in the legend), the heatmap shows FMR when comparing photos of women, aged 30-50, who are born in the countries identified on the two axes. The country of birth is used as a proxy for race. The color and the text encode log10 FMR, such that a +1 difference indicates 10 times the FMR. The countries are grouped by geographic region. The block (diagonal) structure shows FMR tends to be elevated within region. FMR is often lowest in E. Europe.

For comparison of high quality application portraits using the target algorithm, the heatmap shows mean non-mate score when comparing photos of men, aged 30-50, who are born in the countries identified on the two axes. The country of birth is used as a proxy for race. The color encodes score. The countries are grouped by geographic region.

For comparison of high quality application portraits using the target algorithm, the heatmap shows mean non-mate score when comparing photos of women, aged 30-50, who are born in the countries identified on the two axes. The country of birth is used as a proxy for race. The color encodes score. The countries are grouped by geographic region.

Operating on visa images, the heatmap shows false match observed over impostor comparisons of faces from different individuals who have the given age pair. False matches are counted against a recognition threshold fixed globally to give FMR = 0.0001 over all on the order of 10^10 impostor comparisons. The text in each box gives the same quantity as that coded by the color. Light colors present a security vulnerability to, for example, a passport gate.

Operating on visa images, the heatmap shows false match rates observed over impostor comparisons of faces from different individuals who were born in the given region pair. False matches are counted against a recognition threshold fixed globally to give the target FMR in the plot title, computed over all on the order of 10^10 impostor comparisons. If text appears in each box it gives the same quantity as that coded by the color. Grey indicates FMR is at the intended FMR target level. Light red colors present a security vulnerability to, for example, a passport gate. Each +1 increase in log10 FMR corresponds to a factor of 10 increase in FMR. The matrix is not quite symmetric because images in the enrollment and verification sets are different.

Operating on visa images, the heatmap shows false match rates observed over impostor comparisons of faces from different individuals who were born in the given country pair. False matches are counted against a recognition threshold fixed globally to give the target FMR in the plot title, computed over all on the order of 10^10 impostor comparisons. If text appears in each box it gives the same quantity as that coded by the color. Grey indicates FMR is at the intended FMR target level. Light red colors present a security vulnerability to, for example, a passport gate. Each +1 increase in log10 FMR corresponds to a factor of 10 increase in FMR. The matrix is not quite symmetric because images in the enrollment and verification sets are different.

Operating on visa images, the heatmap shows how the mean of the impostor distribution for the country pair (a,b) is shifted relative to the mean of the global impostor distribution, expressed as a number of standard deviations of the global impostor distribution. This statistic is designed to show shifts in the entire impostor distribution, not just tail effects that manifest as the anomalously high (or low) false match rates that appear in the previous figures. The countries are chosen to show that skin tone alone does not explain impostor distribution shifts. The figure is computed from same-sex and same-age impostor pairs.