API & CONOPS Document | Participation Agreement | Submission Form |
Last Updated: April 12, 2024


Introduction

The IREX 10: Identification Track assesses iris recognition performance for identification (a.k.a one-to-many) applications. Most flagship deployments of iris recognition operate in identification mode, providing services ranging from prison management, border security, expedited processing, and distribution of resources. Administered at the Image Group’s Biometrics Research Lab (BRL), developers submit their iris recognition software for testing over datasets sequestered at NIST. As an ongoing evaluation, developers may submit at any time.

Leaderboard Tables

Two-eye
by Developer

The table below shows performance statistics for IREX 10 submissions. Results are shown only for the ‘most accurate’ submission from each developer, which is the one that produces the lowest FNIR @ FPIR = 0.01.

Accuracy Metric : FNIR (i.e., “miss rate”) at an FPIR of 0.01 (± 90% confidence)
Dataset: Operational Dataset 4th pull (stats on OPS4 images)
Samples used: Both eyes
Enrolled Population: 500K people
Enrollment Method: Both (left and right) iris images per enrollment template

The number after the ± indicates either the 90% confidence interval (for accuracy) or the standard deviation (for times and sizes).

Two-eye
by Submission

The table below shows performance statistics for all submissions to IREX 10. Many developers submitted multiple times.

Accuracy Metric : FNIR (i.e., “miss rate”) at an FPIR of 0.01 (± 90% confidence)
Dataset: Operational Dataset 4th pull (stats on OPS4 images)
Samples used: Both eyes
Enrolled Population: 500K people
Enrollment Method: Both (left and right) iris images per enrollment template

The number after the ± indicates either the 90% confidence interval (for accuracy) or the standard deviation (for times and sizes).

Single-eye
by Developer

The table below shows performance statistics for IREX 10 submissions. Results are shown only for the ‘most accurate’ submission from each developer, which is the one that produces the lowest FNIR @ FPIR = 0.01.

Accuracy Metric : FNIR (i.e., “miss rate”) at an FPIR of 0.01 (± 90% confidence)
Dataset: Operational Dataset 4th pull (stats on OPS4 images)
Samples used: Single eye
Enrolled Population: 1M irides (500K people)
Enrollment Method: One iris image per enrollment template

The number after the ± indicates either the 90% confidence interval (for accuracy) or the standard deviation (for times and sizes).

Single-eye
by Submission

The table below shows performance statistics for all submissions to IREX 10. Many developers submitted multiple times.

Accuracy Metric : FNIR (i.e., “miss rate”) at an FPIR of 0.01 (± 90% confidence)
Dataset: Operational Dataset 4th pull (stats on OPS4 images)
Samples used: Single eye
Enrolled Population: 1M irides (500K people)
Enrollment Method: One iris image per enrollment template

The number after the ± indicates either the 90% confidence interval (for accuracy) or the standard deviation (for times and sizes).

DET Accuracy

Core accuracy for the identification task can be characterized by Detection-error trade-off (DET) plots. Generally, curves lower down in a DET plot correspond to more accurate matchers. The plots are interactive through the use of the Plotly.js graphing library.

Two-Eye Accuracy

Dataset: Operational Dataset 4th pull (stats on OPS4 images)
Samples used: Both eyes
Enrolled Population: 500K people
Enrollment Method: Both (left and right) iris images per enrollment template

Single-Eye Accuracy

Dataset: Operational Dataset 4th pull (stats on OPS4 images)
Samples used: Single eye
Enrolled Population: 1M irides (500K people)
Enrollment Method: One iris image per enrollment template

Ranked Accuracy

Rank-based metrics are general better at reflecting performance for investigational tasks, where the algorithm returns a list of candidates for an inspector to further scrutinize. The rank 10 “hit rate” is the fraction of searches that return the correct candidate within the top 10 candidates. The miss rate is one minus the hit rate.

Two-eye
by Developer

The table below shows rank-based accuracy for IREX 10 submissions. Results are shown only for the ‘most accurate’ submission from each developer, which is the one that produces the lowest miss rate when only the top-ranked candidate for each search is considered.

Dataset: Operational Dataset 4th pull (stats on OPS4 images)
Samples used: Both eyes
Enrolled Population: 500K people
Enrollment Method: Both (left and right) iris images per enrollment template
Two-eye
by Submission

The table below shows rank-based accuracy for all submissions to IREX 10. Many developers submitted multiple times.

Dataset: Operational Dataset 4th pull (stats on OPS4 images)
Samples used: Both eyes
Enrolled Population: 500K people
Enrollment Method: Both (left and right) iris images per enrollment template
Single-eye
by Developer

The table below shows rank-based accuracy for IREX 10 submissions. Results are shown only for the ‘most accurate’ submission from each developer, which is the one that produces the lowest miss rate when only the top-ranked candidate for each search is considered.

Dataset: Operational Dataset 4th pull (stats on OPS4 images)
Samples used: Single eye
Enrolled Population: 1M irides (500K people)
Enrollment Method: One iris image per enrollment template
Single-eye
by Submission

The table below shows rank-based accuracy for all submissions to IREX 10. Many developers submitted multiple times.

Dataset: Operational Dataset 4th pull (stats on OPS4 images)
Samples used: Single eye
Enrolled Population: 1M irides (500K people)
Enrollment Method: One iris image per enrollment template

Computation Times

Computation times are measured as the the elapsed real time (i.e., “wall clock” time) as opposed to CPU time. Timing estimates were computed on unloaded machines with only a single process dedicated to biometric operations. The test machines are Dell PowerEdge M910 blades with Dual Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz CPUs (with 10 cores per processor). Each template was created from images of both a person’s left and right eye. The images were typically 640x480 pixels.

Dataset: Operational Dataset 4th pull (stats on OPS4 images)
Samples used: Both eyes
Enrolled Population: 500K people
Enrollment Method: Both (left and right) iris images per enrollment template


Previous IREX evaluations identified a speed-accuracy trade-off whereby the more accurate matchers tend to take longer to return search results. The plot below shows FNIR as a function of median search time for each matcher. FNIR computed at an FPIR of \(0.01\).

Dataset: Operational Dataset 4th pull (stats on OPS4 images)
Samples used: Both eyes
Enrolled Population: 500K people
Enrollment Method: Both (left and right) iris images per enrollment template

Quality Assessment

Some of the developer’s submissions output estimates of sample quality for each processed iris image. The ANSI/NIST-ITL 1-2011 standard requires these estimates to be in the range 0 to 100 and to quantitatively express the predicted matching performance of the sample. Error-reject rate curves show how FNIR can be reduced by discarding the poorest quality samples in the test data. In our case, the quality of a search was set to the minimum quality assigned to the searched image and its enrolled mate.

The figure below demonstrates that FNIR (i.e. the ‘miss rate’) can be reduced by almost 20% by discarding just 1% of the poorest quality searches. Presumably, this 1% involved samples where the subject was blinking, moving, looking off-axis at the moment of capture, etc. The IREX III supplemental failure analysis found that matching failures for the most accurate matchers over a different dataset were almost entirely due to poor presentation of the iris.

Dataset: Operational Dataset 4th pull (stats on OPS4 images)
Samples used: Single eye
Enrolled Population: 1M irides (500K people)
Enrollment Method: One iris image per enrollment template


The stacked barplot below shows how sample quality impacts the probability that a search will miss (i.e. fail to return the correct mate). Samples assigned low quality values should be more likely to miss. For Neurotechnology’s matcher, when the assigned value is 0 the probability of a miss is greater than 50%. FPIR is set to \(0.01\).

Dataset: Operational Dataset 4th pull (stats on OPS4 images)
Samples used: Single eye
Enrolled Population: 1M irides (500K people)
Enrollment Method: One iris image per enrollment template

The sample quality of left and right iris images acquired during the same session are expected to be highly correlated. In addition to having similar capture environments, dual-eye cameras acquire both images at nearly the same instant so poor presentation of the irides at the moment of capture (e.g. blinking or moving at the moment of capture) detrimentally affects both images. For this reason, matching both acquired images vs. matching just one yields only a moderate improvement in accurary. The figure below shows the distribution of qualities with each axis represneting the quality of one of the iris images (left or right) acquired during the same capture session.

Dataset: Operational Dataset 4th pull (stats on OPS4 images)
Samples used: Single eye
Enrolled Population: 1M irides (500K people)
Enrollment Method: One iris image per enrollment template


The acquisition protocol for OPS4 images has probably improved over time. Better iris cameras and capture environments are likely to have improved the quality of the acquired images. Iris recognition accuracy is highly dependent on the prevalence of very poor quality samples. Misses tend to occur when the subject was blinking, moving, looking off-axis (etc.) at the instant of capture. The figure below shows the prevalence of these very low quality samples in OPS4 for each capture year. Comparatively few images in OPS4 were collected prior to 2014 so results for these images are omitted. An iris sample was deemed to have very low quality if its quality value is among the lowest 2% (i.e. below the 2% quantile) of all images in OPS4.

Dataset: Operational Dataset 4th pull (stats on OPS4 images)
Samples used: Single eye
Enrolled Population: 1M irides (500K people)
Enrollment Method: One iris image per enrollment template

Score-level Fusion

Combining the results from multiple submissions sometimes yields improved accuracy over individual submissions. In this section score-level fusion is used to combine search results from multiple submissions. Equal-weighted Neyman-Pearson fusion is used to merge candidate lists from different submissions into a single consolidated candidate list. The dissimilarity score associated with each candidate is normalized prior to fusion (see LFAR score). This normalized score is a measure of similarity rather than dissimilarity. Any candidate appearing on multiple lists is assigned a single fused score by summing the the individual LFAR scores. The merged candidate list is then reordered by the LFAR scores.

Only fusion results that yield an improvement in accuracy over the individual submissions are shown.

Impact of Enrollment Size

Accuracy is impacted by the size of the enrollment database (a.k.a the gallery size). Identification of the correct mate is expected to be more difficult for larger enrollment database sizes. The figure below plots FNIR (at FPIR=\(0.01\)) as a function of enrollment database size.

Dataset: Operational Dataset 4th pull (stats on OPS4 images)
Samples used: Both eyes
Enrollment Method: Both (left and right) iris images per enrollment template

Some apparant trends may be the result of random variation. Results for the 10K enrollment size were computed from 140K searches. Results for the 50K, 100K and 500K enrollment sizes were computed from 700K searches.

How to Participate

Participation is open to any commercial or academic organization free of charge. Instructions on building a submission can be found in the API and concept of operations (CONOPS) document. The CONOPS document is supplemented by the frvt1N.h and frvt_structs.h. To assist with development, a minimal working “stub” (a.k.a null implementation) is also available. See also our FAQ.

All algorithm submissions, a signed participation agreement, and developer public key must be submitted through the FRTE/FATE/IREX Submission Form, which requires encrypted files be provided as a download link from a generic http server (e.g., Google Drive). We cannot accept Dropbox links. NIST will not register, or establish any kind of membership, on the provided website. Participants can submit their algorithm(s), participation agreement, and GPG key at the same time via the submission form.

Participants are allowed to submit an implementation once every 4 calendar months.

Please send comments and recommendations to irex@nist.gov.

Contact Info

Inquiries and comments may be submitted to irex@nist.gov. Subscribe to the IREX mailing list to stay up-to-date on all IREX-related activities.