FRTE Demographics
FRTE Demographic Effects Art
Credit: Natasha Hanacek/NIST

Latest 1:1 Report | Latest 1:N Report

Overview

Status
[2023-08-18] FRVT was split and renamed to FRTE and FATE.


This page summarizes and links to all FRTE data and reports related to demographic effects in face recognition.

2022-07-12 NIST Interagency Report 8429: FRTE Part 8: Summarizing Demographic Differentials PDF
2019-12-20 NIST Interagency Report 8280: FRTE Part 3: Demographics Effects PDF


The next section includes draft summary measures for demographic differentials for 1:1 recognition algorithms.

FRTE 1:1 Demographic Differentials Summary

The table, last updated on 2024-01-23, includes summary indicators for how the two fundamental error rates vary by age, sex, and race. These are false negative demographic effects i.e. failures to associate two photos of an individual, and false positive demographic effects i.e. incorrect association of two photos of different individuals. False negatives are strongly dependent on image quality, and poor photography of a face can induce a demographic effect. Two examples that will elevate false negative rates: 1. inadequate lighting or under-exposure of dark-skinned individuals, or over-exposure of fair-skinned subjects; 2. failure to adjust a camera for very tall or short individuals can lead to a pitch-angle variation False positive variations occur even with good image quality; they arise when an algorithm produces similarity score distributions that are displaced for one demographic versus another. This can occur due to under-representation of a demographic in the image dataset used for algorithm training.

False Positive Demographics

False Positive Differentials

The table below - last updated on 2024-01-23 - includes summary indicators for false positive demographic effects i.e. incorrect associations of two photos of two different individuals, the rate of which differs by age, sex or race. False positives occur primarily due to anatomic similarity of the faces and this occurs in images of pristine quality. False positives can in principle occur also due to image artifacts, such as reflections or eye-glasses, but these are specific to particular recognition algorithms.
FMR values are measured over comparisons of two high quality frontal portrait images of two people of the same sex, same age group, and same region of birth. The threshold is fixed for each algorithm to give FMR if 0.0003 overall.


The rows list algorithms submitted to the 1:1 track of FRTE. The relative importance of demographic effects in FNMR versus FMR is highly application dependent. The columns are:
1: Algorithm name
2: Date algorithm was submitted to NIST
3: A summary FNMR value (so that readers can look at the more accurate algorithms first). Smaller values are better.
4: The best FMR value for any one demographic - this is the lowest across 5 regions, 5 age groups and two sexes
5: The worst FMR value for any one demographic - this is the highest across 5 regions, 5 age groups and two sexes
6: The ratio of worst to best FMR given in prior two columns i.e. how-many times larger one is than the other. Smaller values are better
7: The ratio of worst FMR to arithmetic mean of FMR of all demographics
8: The ratio of worst FMR to geometric mean of FMR of all demographics
9: The arithmetic mean of absolute value of log10 of all ratios FMR / X where X = geometric mean of FMR, over all demographics. This measures the spread of FMR values around their geometric mean. Smaller values are better
10: The Gini coefficient for FMR, the mean absolute difference across all groups, divided by the mean 11: FMR E. Africa over FMR E. Europe - numerator and denominator are geometric means of 10 FMR ratios (5 age groups, 2 sexes). Ratios close to 1 are best, and ranked accordingly
12: FMR E. Asia over FMR E. Europe - numerator and denominator are geometric means of 10 FMR ratios (5 age groups, 2 sexes). Ratios close to 1 are best, and ranked accordingly
13: FMR S. Asia over FMR E. Europe - numerator and denominator are geometric means of 10 FMR ratios (5 age groups, 2 sexes). Ratios close to 1 are best, and ranked accordingly
14: FMR Female over FMR Male - numerator and denominator are geometric means of 25 FMR ratios estimates (5 regions, 5 age groups). Ratios close to 1 are best, and ranked accordingly
15: FMR 65+ over FMR (20-35] - numerator and denominator are geometric means of 10 FMR ratios estimates (5 regions, 2 sexes). Ratios close to 1 are best, and ranked accordingly


A note on 1:N While this table lists results for 1:1 algorithms, it will have relevance to that subset of 1:N algorithms that implement 1:N search as N 1:1 comparisons followed by a sort operation. The demographic effects noted here will be material in 1:N operations and will be magnified if the gallery and the search stream include the affected demographic. This is discussed in the Annex, and in the academic literature PDF, PDF. Note those publications use the Binomial model of 1:N search and it is known that some 1:N algorithms do not implement 1:N simply as N 1:1 comparisons, so the model does not apply. Demographic effects there must be measured in separate empiricial 1:N tests.

False Negative Demographics

False Negative Differentials

The table includes summary indicators for false negative demographic effects i.e. failures to associate two photos of an individual that differ by age, sex or race. False negatives are strongly dependent on image quality, and poor photography of a face can induce a demographic effect. Two examples that will elevate false negative rates
1. inadequate lighting or under-exposure of dark-skinned individuals, or over-exposure of fair-skinned subjects
2. failure to adjust a camera for very tall or short individuals can lead to a pitch-angle variation

The rows list algorithms submitted to the 1:1 track of FRTE. The relative importance of demographic effects in FNMR versus FMR is highly application dependent. The columns are:
1: Algorithm name
2: Date algorithm was submitted to NIST
3: A summary FNMR value (so that readers can look at the more accurate algorithms first)
4: The best FNMR value in any region of birth
5: The worst FNMR value in any region of birth
6: The ratio of worst to best FNMR, giving “how-many times larger one is than the other”
7: The maximum FNMR over the arithmetic mean FNMR. The ideal result is 1, indicating parity
8: The maximum FNMR over the geometric mean FNMR. The ideal result is 1, indicating parity
9: The arithmetic mean of the absolute value of log10 of all ratios FNMR / X where X = geometric mean of FNMR, over all regions
10: The Gini coefficient for FNMR, the mean absolute difference across all groups, divided by the mean


FNMR is computed over comparisons of medium quality airport immigration entry photos with high quality reference portraits. False negatives are determined by comparing similarity scores with a threshold set for each algorithm to give FMR of 0.00001 overall.
A note on 1:N In addition, if the 1:1 algorithm were used to implement 1:N search (via N-comparisons and a sort operation), the demographic effects noted here would be germane to the 1:N application. Note that some operational 1:N algorithms do employ 1:1 algorithms, and results for those algorithms must be measured in separate 1:N tests.

Contact Information

Inquiries and comments may be submitted to frvt@nist.gov.

Subscribe

Subscribe to the FRTE mailing list to receive emails when announcements or updates are made.

Related NIST Projects

Ongoing Face Evaluations

FRTE Projects

FRTE 1:1 Verification
FRTE 1:N Identification
FRTE Demographic Effects
FRTE Face Mask Effects
FRTE Paperless Travel
FRTE Twins Demonstration
FRTE FIVE

FATE Projects

FATE MORPH
FATE Quality
FATE PAD
FATE Age Estimation & Verification